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HUMAN ORPHAN C PROTEIN-COUPLED RECEPTORS 

This patent document claims priority benefit of each of the following applications, 
all filed with the United States Patent and Trademark Office via U.S. Express Mail on the 

5 indicated filing dates: U.S. Provisional Number 60/121,852, filed; February 26, 1999 
claiming the benefit of U.S. Provisional Number 60/109,213, filed November 20, 1998; 
U.S. Provisional Number 60/1 20,4 1 6, filed February 16, 1999; U.S. Provisional Number 
60/123,946, filed March 12, 1999; U.S. Provisional Number 60/123,949, filed March 12, 
1999; U.S. Provisional Number 60/136,436, filed May 28, 1999; U.S. Provisional 

10 Number 60/136,439, filed May 28, 1999; U.S. Provisional Number 60/136,567, filed May 

28, 1999; U.S. Provisional Number 60/137,127, filed May 28, 1999; U.S. Provisional 
Number 60/137,131, filed May 28, 1999; U.S. Provisional Number 141,448, filed June 

29, 1999 claiming priority from U.S. Provisional Number 60/136.437. filed May 28, 
1999; U.S. Provisional Number (Arena Pharmaceuticals, Inc. docket number 

15 CHN10-1), filed September 29, 1999; U.S. Provisional Number 60/156,333, filed 
September 29, 1999; U.S. Provisional Number 60/156,555, filed September 29, 1999; 
U.S. Provisional Number 60/1 56.634, filed September 29, 1999; U.S. Provisional 

Number (Arena Pharmaceuticals, Inc. docket number RUP6-1 ), filed October 1 , 

1999; U.S. Provisional Number (Arena Pharmaceuticals, Inc. docket number 

20RUP7-1), filed October 1, 1999; U.S. Provisional Number (Arena 

Pharmaceuticals, Inc. docket number CIIN6-1 ), filed October 1, 1999; U.S. Provisional 
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Number (Arena Pharmaceuticals, Inc. docket number RUP5-1), filed October 1, 

1999; U.S. Provisional Number (Arena Pharmaceuticals, Inc. docket number 

CHN9-1), filed October 1, 1999. This patent document is related to U.S. Serial Number 
09/170,496 filed October 13, 1998, and U.S. Serial Number unknown (Woodcock 
5 Washburn Kurtz Mackiewicz & Norris, LLP docket number AREN-0054 ) filed on 
October 12, 1999 (via U.S. Express Mail) both being incorporated herein by reference. 
This patent document also is related to U.S. Serial No. 09/364,425; filed July 30, 1999, 
which is incorporated by reference in its entirety. This application also claims priority 
to U.S. Serial Number (Woodcock, Washburn, Kurtz, Makiewicz & Norris, LLP 

lOdocket number AREN-0050), filed on October 12, 1999 (via U.S. Express Mail), 
incorporated by reference herein in its entirety. Each of the foregoing applications are 
incorporated herein by reference in their entirety. 

FIELD OF THE INVENTION 
The invention disclosed in this patent document relates to transmembrane receptors, 

15 and more particularly to endogenous, orphan, human G protein-coupled receptors 
("GPCRs"). 

BACKGROUND OF THE INVENTION 

Although a number of receptor classes exist in humans, by far the most abundant and 
therapeutically relevant is represented by the G protein-coupled receptor (GPCR or GPCRs) 
20 class. It is estimated that there are some 100,000 genes within the human genome, and of 
these, approximately 2% or 2,000 genes, are estimated to code for GPCRs. Receptors, 
including GPCRs, for which the endogenous ligand has been identified are referred to as 
"known" receptors, while receptors for which the endogenous ligand has not been identified 
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are referred lo as "orphan" receptors. GPCRs represent an important area for the 
development of pharmaceutical products: from approximately 20 of the 100 known GPCRs, 
60% of all prescription pharmaceuticals have been developed. I his distinction is not merely 
semantic, particularly in the case of GPCRs. Thus, the orphan GPCRs are to the 

5 pharmaceutical industry what gold was to California in the late 1 9 th century - an opportunity 
to drive growth, expansion, enhancement and development. 

GPCRs share a common structural motif. All these receptors have seven sequences 
of between 22 to 24 hydrophobic amino acids that f orm seven alpha helices, each of which 
spans the membrane (each span is identified by number, i.e., transmembrane- 1 (TM-1), 

10 transmebrane-2 (TM-2), etc.). The transmembrane helices are joined by strands of amino 
acids between transmembrane-2 and transmembrane-3, transmembrane-4 and 
transmembrane-5, and transmembrane-6 and transmembrane-7 on the exterior, or 
"extracellular" side, of the cell membrane (these are referred to as "extracellular" regions 1, 
2 and 3 fI:C-l, EC-2 and EC-3), respectively). The transmembrane helices are also joined 

1 5 by strands of amino acids between transmembrane- 1 and transmembrane-2, transmembrane-3 
and transmembrane-4. and transmembrane-5 and transmembrane-6 on the interior, or 
"intracellular" side, of the cell membrane (these are referred to as "intracellular" regions 1, 
2 and 3 (1C-1, IC-2 and IC-3), respectively). The "carboxy" ("C") terminus of the receptor 
lies in the intracellular space within the cell, and the "amino" ("N") terminus of the receptor 

20 lies in the extracellular space outside of the cell. 

Generally, when an endogenous ligand hinds with the receptor (often referred to as 
"activation" of the receptor), there is a change in the conformation of the intracellular region 
that allows for coupling between the intracellular region and an intracellular "G-protein." It 
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has been reported that GPCRs are "promiscuous" with respect to G proteins, i.e., that a 
GPCR can interact with more than one G protein. See, Kenakin, T., 43 Life Sciences 1095 
(1988). Although other G proteins exist, currently, Gq, Gs, Gi, and Go are G proteins that 
have been identified. Endogenous ligand-activated GPCR coupling with the G-protein 
5 begins a signaling cascade process (referred to as "signal transduction"). Under normal 
conditions, signal transduction ultimately results in cellular activation or cellular inhibition. 
It is thought that the IC-3 loop as well as the carboxy terminus of the receptor interact with 
the G protein. 

Under physiological conditions, GPCRs exist in the cell membrane in equilibrium 
10 between two different conformations: an "inactive" state and an "active" state. A receptor 
in an inactive state is unable to link to the intracellular signaling transduction pathway to 
produce a biological response. Changing the receptor conformation to the active state allows 
linkage to the transduction pathway (via the G-protein) and produces a biological response. 
A receptor may be stabilized in an active state by an endogenous ligand or a compound such 
15 as a drug. 

SUMMARY OF THE INVENTION 

Disclosed herein are human endogenous orphan G protein-coupled receptors. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figures 1 A and IB provide reference "grids" for certain dot-blots provided herein 
20 (see also. Figure 2A and 2B, respectively). 

Figures 2A and 2B provide reproductions of the results of certain dot-blot analyses 
resulting from hCHN3 and hCHN8, respectively (see also, Figures 1 A and IB, respectively). 
Figure 3 provides a reproduction of the results of RT-PCR analysis of hRUP3. 

uiuuuuiu. hid ri - 



WO 00/31258 POT/US99/23687 



Figure 4 provides a reproduction of the results of RT-PCR analysis of hRUIM. 
Figure 5 provides a reproduction of the results of RT-PCR analysis of hRUP6. 

DETAILED DESCRIPTION 
The scientific literature that has evolved around receptors has adopted a number of 
5 terms to refer to ligands having various effects on receptors, f or clarity and consistency, the 
following definitions will be used throughout this patent document. To the extent that these 
definitions conflict with other definitions for these terms, the following definitions shall 
control: 

AMINO ACID ABBREVIATIONS used herein are set out in 'fable 1 : 



10 




TABLE 1 






A LAN INK 


ALA 


A 




ARGININK 


ARG 


R 




ASPARAGINE 


ASN 


N 




ASPARTIC ACID 


ASP 


D 


15 


CYSTEINE 


CYS 


C 




GLUTAMIC ACID 


GLU 


E 




GLUT A MINE 


GEN 


0 




GLYCINE 


GLY 


G 




HISTID1NE 


HIS 


H 


20 


ISOLEUCINE 


ILE 


1 




LEUCINE 


LEU 


L 




LYSINE 


LYS 


K 




METHIONINE 


MET 


M 




phenylalanine: 


PHI: 


K 


25 


PROLINE 


PRO 


P 




SERINE 


SER 


S 




THREONINE 


THR 


T 




TRYPTOPHAN 


TRP 


W 




TYROSRsIE 


TYR 


Y 


30 


VALINE. 


V A I , 


V 



COMPOSITION means a material comprising at least one component. 

ENDOGENOUS shall mean a material that a mammal naturally produces. 
ENDOGENOUS in reference to, for example and not limitation, the term "receptor/' shall 
mean that which is naturally produced by a mammal (for example, and not limitation, a 



WO 00/3 1 258 PCT/US99/23687 

-6- 

human) or a virus. By contrast, the term NON-ENDOGENOUS in this context shall mean 
that which is not naturally produced by a mammal (for example, and not limitation, a human) 
or a virus. 

HOST CELL shall mean a cell capable of having a Plasmid and/or Vector 
5 incorporated therein. In the case of a prokaryotic Host Cell, a Plasmid is typically replicated 
as a autonomous molecule as the Host Cell replicates (generally, the Plasmid is thereafter 
isolated for introduction into a eukaryotic Host Cell); in the case of a eukaryotic Host Cell, 
a Plasmid is integrated into the cellular DNA of the Host Cell such that when the eukaryotic 
Host Cell replicates, the Plasmid replicates. Preferably, for the purposes of the invention 
10 disclosed herein, the Host Cell is eukaryotic, more preferably, mammalian, and most 
preferably selected from the group consisting of 293, 293T and COS-7 cells. 

LIGAND shall mean an endogenous, naturally occurring molecule specific for an 
endogenous, naturally occurring receptor. 

NON-ORPHAN RECEPTOR shall mean an endogenous naturally occurring 
15 molecule specific for an endogenous naturally occurring ligand wherein the binding of a 
ligand to a receptor activates an intracellular signaling pathway. 

ORPHAN RECEPTOR shall mean an endogenous receptor for which the 
endogenous ligand specific for that receptor has not been identified or is not known. 

PLASMID shall mean the combination of a Vector and cDNA. Generally, a Plasmid 
20 is introduced into a Host Cell for the purposes of replication and/or expression of the cDNA 
as a protein. 

VECTOR sin reference to cDN A shall mean a circular DNA capable of incorporating 
at least one cDNA and capable of incorporation into a Host Cell. 
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1 'lie order of the following sections is *m t^u r 

t sections is set forth lor presentational efficiency and is 

not intended, nor should be construed, as a li 
follow. 



limitalion on the disclosure or (he claims to 



Identification of Human GPCRs 
» 'l,c efforus of Uk lta» Geno™ p roJ ,„ „ av , led ,„ , ht . idcnli(iallioo „ f , ^ 

or,„r <)ra , ali „„ rcgardmg nuclcic acid sequenccs |ocaiej withm hunwn gcnonK; , ( ^ 

been ,he case ,„ ,„i S endeavor lh a, emellc ^ m(mm ^ ^ ^ 

™ l '" d - rS ' andme W — » — « no, a„ y Bt , n(lnm . 
^"^--^comainopen-read.ngfranre, 
>» Severa, mel „„ ds of ldcnlif) , ne nuck , c aad seiiumccs c human sci]()mc ^ ^ 

•he purview of , noS e „avi„ B „ rdiriarv skj|| „, , hc m ^ ^ ^ ^ 

varie,, ofOPCR, d.sciosed nerci, were Covered „ y r ev,ew,„ e u, GenBan^ dalal)asc 
whUe o„„ GPCRs were discovered by ^ a nuclcic acid scquencc of a GpcR 
prev.onslv sequenced. ,„ co n due, . BLAST™ se a rc h of U,e EST darabase. Table A, bekm. 

' 5 diKl ° Sed end ° Sen ° US ^ GI> ™» *• wi* a GPCR 's respecive h„ m „,„ s „ us 

GPCR: 



Disclosed 
Humn n 
Orphan 
GPCRs 



liARE-3 
hARE-4 



Accession 
Number 
Identified 



A L03 3 3 79 
AC006087 



TABLE A 
Open Reading 

Frame 
(Base Pairs) 



1,260 bp 
1 J 19 bp 



PerCcnt Reference To 

Homology Homologous 

To Designated GPCR 

GPCR (Accession No.) 



52.3% LPA-R 
36% P2Y5 



U92642 
AEOO0546 
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hARE-5 


AC006255 


1,104 bp 


32% Oryzias 


D43633 




hGPR27 
hARE-1 


AA775870 
Al 090920 


1,128 bp 
999 bp 


latipes 
43% 


D 13626 


5 


hARE-2 
hPPRl 
hG2A 
HRUP3 


AA359504 

H67224 
AA754702 
AL035423 


1,122 bp 
1,053 bp 
1,1 13 bp 
1,005 bp 


K1AA0001 
53% GPR27 
39% EBI1 
31%GPR4 
30% 

Drosophila 


L31581 
L36148 
2133653 




HRUP4 


A1307658 


1,296 bp 


32% pNPGPR 
28% and 29 % 
Zebra fish Ya 
and Yb, 


NPJ)04876 
AAC41276 

and 
AAB94616 




hRUP5 


AC005849 


1,413 bp 


respectively 
25% DEZ 


Q99788 


10 
15 


HRUP6 
hRUP7 
hCHN3 
hCHN4 
hCHN6 
hCHN8 


AC005871 
AC007922 

CLo 1 jOJO 1 

AA80453 1 
EST 2 134670 
EST 764455 


1,245 bp 
1,173 bp 

1 111 u _ 

1 , 1 1 J bp 
1,077 bp 
1,503 bp 
1,029 bp 


23% FMLPR 
48% GPR66 

43% H3R 
53% GPR27 
32% thrombin 
36% edg- 1 
47% 


P21462 
NP 006047 
AF140538 

4503637 
NP001391 
D13626 




hCHN9 
hCHNIO 


EST 1541536 
EST 1365839 


1,077 bp 
1,055 bp 


KIAA0001 
41%LTB4R 
35% P2Y 


NMJ)00752 
NM 002563 



Receptor homology is useful in terms of gaining an appreciation of a role of the 
disclosed receptors within the human body. Additionally, such homology can provide insight 
as to possible endogenous ligand(s) that may be natural activators for the disclosed orphan 
GPCRs. 

B. Receptor Screening 

Techniques have become more readily available over the past few years for 
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endogenous-l.gund identification „h, s . primarily. lor lhc purp()w ()f p „ )vjding a n ^ 
conducing reccptor-bmding assays „„, rcquirc a rcct , pl „, s , ^ ^ 

,rad„,„„al sl udy of recep.ors has ^ proceeded from ,hea pr,„r, assumption (histoncaHv 
based, ,ha, .he endogenous l.gand must firs, be iden,.ficd before discovery c„u,d proceed to 
5 find antagonist and „,her moiecu.es tlta, eould affee, ,be receptor. Hven ,„ cases where an 
antagonist might have been known firs,, rhe search immediately extended ,o looking f or ,„e 
endogenous Itgand. This mode ofthinking has persisted in receptor research even after the 
discovery of constitutive.,, activated receptors. What has no, been heretofore recognized is 
tha, i, is the aettve state of, he receptor that is m „s, usefu. for discovering agon.sts. partia, 
10 agonists, and tnverse agonists of the receptor. For those diseases which resu,, front an over.y 
ae,,ve reeep.or or an under-aetive receptor, what ,s desired ,„ a therapeutic drug is a 
compound which acts ,„ diminish the active state of a receptor or enhance the activity of the 
receptor, respective.,, no, neeessari.yadrug which isanantagonisttotheendogenous Hgand. 
This is because a compound tha, reduces or enhances ,he activity of the active receptor state 
.5 need no, bind a, the same she as the endogenous itgand. Thus, as taugh, by a method of , his 
invention, any search for .herapeutte compounds shou.d star, by screening expounds 
against the ligand-indcpendent active state. 

As is known in the an, GPCRs can be "active" in their endogenous state even without 
the binding of the receptor's endogenous I, gan d thereto. Such naturaily-act.ve receptors can 
20 be screened for the d.rec, identification (, , , without the need for the receptor's endogenous 
Hgand) of. ln particular, inverse agon.sts. Alternatively, the receptor can be "activated" via. 

mutation of the receptor to establish a non-endogenous version of the receptor that is 
active in the absence of the receptor's endogenous hgand. 
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Screening candidate compounds against an endogenous or non-endogenous, 
constitutively activated version of the human orphan GPCRs disclosed herein can provide 
for the direct identification of candidate compounds which act at this cell surface receptor, 
without requiring use of the receptor's endogenous ligand. By determining areas within 

5 the body where the endogenous version of human GPCRs disclosed herein is expressed 
and/or over-expressed, it is possible to determine related disease/disorder states which are 
associated with the expression and/or over-expression of the receptor; such an approach is 
disclosed in this patent document. 

With respect to creation of a mutation that may evidence constitutive activation of 

0 human orphan GPCRs disclosed herein is based upon the distance from the proline residue 
at which is presumed to be located within TM6 of the GPCR typically nears the TM6/IC3 
interface (such proline residue appears to be quite conserved). By mutating the amino acid 
residue located 16 amino acid residues from this residue (presumably located in the IC3 
region of the receptor) to, most preferably, a lysine residue, such activation may be obtained. 

5 Other amino acid residues may be useful in the mutation at this position to achieve this 
objective. 

C. Disease/Disorder Identification and/or Selection 

Preferably, the DN A sequence of the human orphan GPCR can be used to make a 
probe for (a) dot-blot analysis against tissue-mRNA, and/or (b) RT-PCR identification of 
0 the expression of the receptor in tissue samples. The presence of a receptor in a tissue 
source, or a diseased tissue, or the presence of the receptor at elevated concentrations in 
diseased tissue compared to a normal tissue, can be preferably utilized to identify a 
correlation with a treatment regimen, including but not limited to, a disease associated 
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with that disease. Receptors can equally well be localized to regions of organs by this 
technique. Based on the known functions of the specific tissues to which the receptor is 
localized, the putative functional role of the receptor can be deduced. 
I). Screening of Candidate Compounds 
5 1- (Generic CPCR screening assay techniques 

When a G protein receptor becomes constitutive!}' active (i.e., active in the absence 
of endogenous ligand binding thereto), it binds to a G protein (e.g., Gq, (is, Gi, Go) and 
stimulates the binding of GTP to the G protein. The G protein then acts as a GTPase and 
slowly hydrolyzes the OTP to GDP, w hereby the receptor, under normal conditions, becomes 

10 deactivated. However, constitutively activated receptors continue to exchange GDP to GTP. 
A non-hydrolyzable analog of GTP, | ,s S]GTPyS, can be used to monitor enhanced binding 
to membranes which express constitutively activated receptors. It is reported that 
[ 33 S]GTPyS can be used to monitor G protein coupling to membranes in the absence and 
presence of ligand. An example of this monitoring, among other examples well-known and 

15 available to those in the art, was reported by Traynor and Nahorski in 1995. The preferred 
use of this assay system is for initial screening of candidate compounds because the system 
is generically applicable to all G protein-coupled receptors regardless of the particular G 
protein that interacts with the intracellular domain of the receptor. 
2. Specific CPCR screening assay techniques 

20 Once candidate compounds are identified using the "generic" G protein-coupled 

receptor assay (/>., an assay to select compounds that are agonists, partial agonists, or inverse 
agonists), further screening to confirm that the compounds have interacted at the receptor site 
is preferred, f or example, a compound identified by the "generic" assay may not bind to the 
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receptor, but may instead merely "uncouple" the G protein from the intracellular domain. 
a. Gs and Gi. 

Gs stimulates the enzyme adenylyl cyclase. Gi (and Go), on the other hand, inhibit 
this enzyme. Adenylyl cyclase catalyzes the conversion of ATP to cAMP; thus, 
5 constitutively activated GPCRs that couple the Gs protein are associated with increased 
cellular levels of cAMP. On the other hand, constitutively activated GPCRs that couple the 
Gi (or Go) protein are associated with decreased cellular levels of cAMP. See, generally, 
"Indirect Mechanisms of Synaptic Transmission," Chpt. 8, From Neuron To Brain (3 rd Ed.) 
Nichols, J.G. et al eds. Sinauer Associates, Inc. (1992). Thus, assays that detect cAMP can 

10 be utilized to determine if a candidate compound is, e.g., an inverse agonist to the receptor 
{i.e., such a compound would decrease the levels of cAMP). A variety of approaches known 
in the art for measuring cAMP can be utilized; a most preferred approach relies upon the use 
of anti-cAMP antibodies in an ELISA-based format. Another type of assay that can be 
utilized is a whole cell second messenger reporter system assay. Promoters on genes drive 

1 5 the expression of the proteins that a particular gene encodes. Cyclic AMP drives gene 
expression by promoting the binding of a cAMP-responsive DNA binding protein or 
transcription factor (CREB) which then binds to the promoter at specific sites called cAMP 
response elements and drives the expression of the gene. Reporter systems can be constructed 
which have a promoter containing multiple cAMP response elements before the reporter 

20 gene, e.g., p-galactosidase or luciferase. Thus, a constitutively activated Gs-linked receptor 
causes the accumulation of cAMP that then activates the gene and expression of the reporter 
protein. The reporter protein such as p-galactosidase or luciferase can then be detected using 
standard biochemical assays (Chen el al. 1995). 
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Generally, once it is determined that a GPCR is or has been constitutively activated, 
using the assay techniques set forth above (as well as others), it is possible to determine the 
predominant G protein that couples with the endogenous GPCR. Coupling of the G protein 
to the GPCR provides a signaling pathway that can be assessed. Because it is most preferred 
5 that screening take place by use of a mammalian expression system, such a system will be 
expected to have endogenous G protein therein. Thus, by definition, in such a system, the 
constitutively activated orphan GPCR will continuously signal. In this regard, it is preferred 
that this signal be enhanced such that in the presence of, e.g., an inverse agonist to the 
receptor, it is more likely that it will be able to more readily differentiate, particularly in the 

10 context of screening, between the receptor when it is contacted with the inverse agonist. 

The GPCR Fusion Protein is intended to enhance the efficacy of G protein coupling 
with the GPCR. The GPCR Fusion Protein is preferred for screening with a non- 
endogenous, constitutively activated GPCR because such an approach increases the signal 
that is most preferably utilized in such screening techniques, although the GPCR Fusion 

15 Protein can also be (and preferably is) used with an endogenous, constitutively activated 
GPCR. This is important in facilitating a significant "signal to noise" ratio; such a significant 
ratio is import preferred for the screening of candidate compounds as disclosed herein. 

The construction of a construct useful for expression of a GPCR Fusion Protein is 
within the purview of those having ordinary skill in the art. Commercially available 

20 expression vectors and systems offer a variety of approaches that can fit the particular needs 
of an investigator. The criteria of importance for such a GPCR Fusion Protein construct is 
that the GPCR sequence and the G protein sequence both be in-frame (preferably, the 
sequence for the GPCR is upstream of the G protein sequence) and that the "stop" codon of 
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Other uses of the disclosed receptors will become apparent to those in the art based upon, 
inter alia, a review of this patent document. 

EXAMPLES 

The following examples are presented for purposes of elucidation, and not limitation, 
5 of the present invention. While specific nucleic acid and amino acid sequences are disclosed 
herein, those of ordinary skill in the art are credited with the ability to make minor 
modifications to these sequences while achieving the same or substantially similar results 
reported below. Unless otherwise indicated below, all nucleic acid sequences for the 
disclosed endogenous orphan human GPCRs have been sequenced and verified. For 
10 purposes of equivalent receptors, those of ordinary skill in the art will readily appreciate that 
conservative substitutions can be made to the disclosed sequences to obtain a functionally 
equivalent receptor. 
Example 1 

Endogenous Human Gpcrs 
15 1 . Identification of Human GPCRs 

Several of the disclosed endogenous human GPCRs were identified based upon a 
review of the GcnBank database information. While searching the database, the following 
cDNA clones were identified as evidenced below. 



Disclosed Accession Complete DNA 

20 Human Number Sequence 
Orphan (Base Pairs) 

GPCRs 



Open Reading Nucleic Acid Amino 

Frame SEQ.ID. Acid 

(Base Pairs) NO. SEQ.ID. 

NO. 



UiUDmir 
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hARE-3 AL033379 I 11,389 bp 



hARE-4 AC006087 

hARE-5 AC006255 

hRUP3 AL035423 

5 URVV5 AC005849 

hR(JP6 AC00587] 

hRUP7 AC007922 



226,925 bp 
127,605 bp 
140,094 bp 
169 J 44 bp 
218,807 bp 
158,858 bp 



1 ,260 bp 
1,1 19 bp 
U04 bp 
1 ,005 bp 
I ,4 1 3 bp 
1,245 bp 
1 J 73 bp 



1 1 



10 



12 



14 



Other disclosed endogenous human GPCRs were identified by conducting a BLAST 
search of EST database (dbest) using the following EST clones as query sequences. The 
10 following KST clones identified were then used as a probe to screen a human genomic 



library. 

Disclosed Query EST Clone/ 

Human (Sequence) Accession No. 



Orphan 

15 GPCRs 

HGPCR27 Mouse 



hARE-I 



GPCR27 
TDAG 



hARE-2 GPCR27 



hPPRI 



20 hG2A 



Bovine 

PPR1 

Mouse 

I 179426 



Identified 



AA775870 



1689643 

A 1090920 
68530 

A A 359504 
238667 

H67224 
See Example 2(a), 

he low 



Open Nucleic Acid Amino Acid 

Reading SEQ.ID.NO. SEQ.ID.NO. 



Era me 

(Base Pairs) 
1,125 bp 



999 bp 
U22 bp 
1,053 bp 
1 , 1 1 3 bp 



15 



17 



19 



21 



23 



16 



18 



20 



24 
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hCHN4 



I1CHN6 



N.A. 



TDAG 



N.A. 



EST 36581 

(full length) 
1184934 

AA80453I 
EST 2 134670 



(full length) 
EST 764455 
EST 1541536 
hCHNIO Mouse EST Human 1365839 



HCHN8 KIAA0001 
hCHN 9 1365839 



hRUP4 



1365839 
N.A. 



AI307658 



1,1 13 bp 
1,077 bp 
1,503 bp 



1,029 bp 
1,077 bp 
1 ,005 bp 



1,296 bp 



25 



27 



29 



31 

33 
35 



37 
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28 



32 
34 
36 



38 



10 



N.A. = "not applicable". 
2. Full Length Cloning 
a. hG2A (Seq. Id. Nos. 23 & 24) 

Mouse EST clone 1 1 79426 was used to obtain a human genomic clone containing all 
but three amino acid hG2A coding sequences. The 5'end of this coding sequence was 
obtained by using S'RACE™, and the template for PCR was Clontech's Human Spleen 
Marathon-ready™ cDNA. The disclosed human G2A was amplified by PCR using the G2A 
15 cDNA specific primers for the first and second round PCR as shown in SEQ.ID.NO.: 39 and 
SEQ.ID.NO. :40 as follows: 

5 , -CTGTGTACAGCAGTTCGCAGAGTG-3 1 (SEQ.ID.NO.: 39; l sl round PCR) 
5'-GAGTGCCAGGCAG AGCAGGTAGAC-3 ' (SEQ.ID.NO.: 40; second round PCR). 
PCR was performed using Advantage™ GC Polymerase Kit (Clontech; manufacturing 
20 instructions will be followed), at 94°C for 30 sec followed by 5 cycles of 94 °C for 5 sec and 
72°C for 4 min; and 30 cycles of 94° for 5 sec and 70° for 4 min. An approximate 1.3 Kb 
PCR fragment was purified from agarose gel, digested with Hind III and Xba I and cloned 
into the expression vector pRC/CM V2 (Invitrogcn). The cloned-insert was sequenced using 
the T7 Sequenase™ kit (USB Amersham; manufacturer instructions will be followed) and 
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the sequence was compared with the presented sequence. Expression of the human G2A will 
be detected by probing an RNA dot blot (Clontech; manufacturer instructions will be 
followed) with the P 3 - -labeled fragment. 

h. hCHN9 (Seq. Id. Nos. 33 & 34) 
5 Sequencing of the EST clone 1541536 indicated that hCIIN9 is a partial cDNA 

clone having only an initiation eodon; i.e., the termination codon was missing. When 
hCHN9 was used to "blast" against the data base (nr), the 3' sequence of hCllN9 was 
100% homologous to the 5' untranslated region of the leukotricne H4 receptor cDNA, 
which contained a termination codon in the frame with hCHN9 coding sequence. To 

H) determine whether the 5* untranslated region of LTB4R cDNA was the V sequence of 
hCHN9, PGR was performed using primers based upon the 5* sequence flanking the 
initiation codon found in h(TIN9 and the 3' sequence around the termination codon found 
in the LTB4R 5 7 untranslated region. The 5' primer sequence utilized was as follows: 
5■-CCCGAATTCCTGCTTGCTCCCAGCTTGGCCC-3 , (SEQ.ID.NO.: 4 i ; sense) and 

15 5 - TG TG G A TC C TG CTG TC A A A G G TC C C A TTC C G G - 3 ' (SEQ.ID.NO.: 42; antisense). 

PCR was performed using thymus cDNA as a template and rTth polymerase (Perkin lilmer) 
w ith the buffer system provided by the manufacturer, 0.25 uM of each primer, and 0.2 mM 
of each 4 nucleotides. The cycle condition was 30 cycles of 94°C for 1 min, 65°C for Imin 
and 72 °C for 1 min and 10 sec. A 1.1 kb fragment consistent with the predicted size was 

20 obtained from PCR. T his PCR fragment was subcloned into pCMV (sec below) and 
sequenced ( see, SKQ.ID.NO.: 33). 

c. hRUP 4 (Scq. Id. Nos. 37 & 38) 
The full length hRDlM was cloned by RT-PCR with human brain cDNA (Clontech} 
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as templates: 

5'-TCACAATGCTAGGTGTGGTC-3' (SEQ.ID.NO.: 43; sense) and 
5'-TGCATAGACAATGGGATTACAG-3' (SEQ.ID.NO.: 44; antisense). 

PCR was performed using TaqPlus™ Precision™ polymerase (Stratagene; manufacturing 
5 instructions will be followed) by the following cycles: 94°C for 2 min; 94°C 30 sec; 55°C 
for 30 sec, 72°C for 45 sec, and 72°C for 10 min. Cycles 2 through 4 were repeated 30 
times. 

The PCR products were separated on a 1 % agarose gel and a 500 bp PCR fragment 
was isolated and cloned into the pCRII-TOPO vector (Invitrogen) and sequenced using the 
10T7 DNA Sequenase™ kit (Amsham) and the SP6/T7 primers (Stratagene). Sequence 
analysis revealed that the PCR fragment was indeed an alternatively spliced form of 
AI307658 having a continuous open reading frame with similarity to other GPCRs. The 
completed sequence of this PCR fragment was as follows: 

5'-TCACAATGCTAGGTGTGGTCTGGCTGGTGGCAGTCATCGTAGGATCACCCATGTGGCAC 

15GTGCAACAACTTGAGATCAAATATGACTTCCTATATGAAAAGGAACACATCTGCTGCTTAGAA 

GAGTGGACCAGCCCTGTGCACCAGAAGATCTACACCACCTTCATCCTTGTCATCCTCTTCCTCC 

TGCCTCTTATGGTGATGCTTATTCTGTACGTAAAATTGGTTATGAACTTTGGATAAAGAAAAGA 

GTTGGGGATGGTTCAGTGCTTCGAACTATTCATGGAAAAGAAATGTCCAAAATAGCCAGGAAG 

AAGAAACGAGCTGTCATTATGATGGTGACAGTGGTGGCTCTCTTTGCTGTGTGCTGGGCACCA 

20TTCCATGTTGTCCATATGATGATTGAATACAGTAATTTTGAAAAGGAATATGATGATGTCACA 

ATCAAGATGATTTTTGCTATCGTGCAAATTATTGGATTTTCCAACTCCATCTGTAATCCCATTG 
TCTATGCA-3' (SEQ.ID.NO.: 45) 

Based on the above sequence, two sense oligonucleotide primer sets: 
5'-CTGCTTAGAAGAGTGGACCAG-3' (SEQ.ID.NO.: 46; oligo I), 
25 S'-CTGTGCACCAGAAGATCTACAC-S 1 (SEQ.IDNO.: 47; oligo 2) 
and two antisense oligonucleotide primer sets: 

5-CAAGGATGAAGGTGGTGTAGA-3' (SEQ.ID.NO.: 48; oligo 3) 
5'-GTGTAGATCTTCTGGTGCACAGG-3' (SEQ.ID.NO.: 49; oligo 4) 

were used for 3'- and 5 '-race PCR with a human brain Marathon-Ready™ cDNA (Clontech, 



HTJUULIU. UIU UIDiLttnii 



WO 00/31 258 PCT/LJS99/23687 

-21 - 

Cat// 7400-1) as template, aeeording to manufacture's instructions. I)N A fragments 
generated by the RACf PCR were cloned into the pCRII-TOPO™ vector (Invitrogen) and 
sequenced using the SP6/T7 primers (Stratagene) and some internal primers. The 3* RACK 
product contained a poly(A) tail and a completed open reading frame ending at a 'I A A stop 
5 codon. The 5* RACE product contained an incomplete 5 1 end: i.e.. the ATG initiation codon 
was not present. 

Based on the new 5' sequence, oligo 3 and the following primer: 
5ViOAATGCAGGTCATAGTGAGC -V (SHQ.ID.NO.; S(J; oligo 5) 

were used for the second round of 5' RACE PCR and the PCR products were analyzed as 
10 above. A third round of 5' RACK PCR was carried out utilizing antisense primers: 
-V-TGGAGCATGG'I G ACGGGAATGCAGA AG-3 1 (ShQ.ID.NO.: 5 1 ; oligo 6) and 
5 1 -GTGATGAGCAGGTCACTGAGCGCCAAG-3' (SHQ.ID.NO.: 52; oligo7). 

The sequence of the 5' RACK PCR products revealed the presence of the initiation codon 
ATG, and further round of 5' RACE PCR did not generate any more 5' sequence. The 
15 completed 5* sequence was confirmed by RT-PCR using sense primer 
5 , -GCAATGCAGGCGCTTA AC ATTACH (SHQ.ID.NO.: 53; oligo 8) 

and oligo 4 as primers and sequence analysis of the 650 bp PCR product generated from 
human brain and heart cDNA templates (Clontech, Cat// 7404-1). The completed 3* 
sequence was confirmed by RT-PCR using oligo 2 and the following antisense primer: 
20 5 1 - TTG G G TT A C A A T C TG A A G G G C A - 3 1 (SliQ.ID.NO.: 54; oligo 9) 

and sequence analysis of the 670 bp PCR product generated from human brain and heart 
cDNA templates. (Clontech. Cat// 7404-1). 

d. hRUPS (Scq. Id. Nos. 9 & 10) 
The full length hRUP5 was cloned by RT-PCR using a sense primer upstream from 
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ATG, the initiation codon (SEQ.ID.NO. : 55), and an antisense primer containing TCA as the 
stop codon (SEQ.ID.NO.: 56), which had the following sequences: 
S'-ACTCCGTGTCCAGCAGGACTCTGO' (SEQ.ID.NO.:55) 
S'-TGCGTGTTCCTGGACCCTCACGTG-S' (SEQ.ID.NO.: 56) 
5 and human peripheral leukocyte cDNA (Clontech) as a template. Advantage cDNA 
polymerase (Clontech) was used for the amplification in a 50ul reaction by the following 
cycle with step 2 through step 4 repeated 30 times: 94 °C for 30 sec; 94° for 15 sec; 69° for 
40 sec; 72°C for 3 min; and 72°C fro 6 min. A 1 .4kb PCR fragment was isolated and cloned 
with the pCRII-TOPO™ vector (Invitrogen) and completely sequenced using the T7 DNA 
loSequenase™ kit (Amsham). See, SEQ.ID.NO.: 9. 

e. hRUP6(Seq. Id. Nos. 11 & 12) 

The full length hRUP6 was cloned by RT-PCR using primers: 

5'-CAGGCCTTGGATTTTAATGTCAGGGATGG-3' (SEQ.ID.NO.: 57) and 

5'-GGAGAGTCAGCTCTGAAAGAATTCAGG-3' (SEQ.ID.NO.: 58); 
15 and human thymus Marathon-Ready™ cDNA (Clontech) as a template. Advantage cDNA 

polymerase (Clontech, according to manufacturer's instructions) was used for the 

amplification in a 50ul reaction by the following cycle: 94 °C for 30sec; 94 °C for 5 sec; 66 °C 

for 40sec; 72 °C for 2.5 sec and 72 °C for 7 min. Cycles 2 through 4 were repeated 30 times. 

A 1 .3 Kb PCR fragment was isolated and cloned into the pCRII-TOPO™ vector (Invitrogen) 
20 and completely sequenced {see, SEQ.ID.NO.: 1 1) using the ABI Big Dye Terminator™ kit 

(P.E. Biosystem). 

f. hRUP7 (Seq. Id. Nos. 13 & 14) 

The full length RUP7 was cloned by RT-PCR using primers: 
5'-TGATGTGATGCCAGATACTAATAGCAC-3' (SEQ.ID.NO.: 59; sense) and 
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S -CC VGA TTGA I I I AGG 1 GAG A II GAG ACXV (Sl-Q.IIXNO.: 60; antiscnsc) 

and human peripheral leukocyte cDNA (Clontech) as a template. Advantage™ cl)NA 
polymerase (Clontech) was used for the amplification in a 50 ul reaction by the following 
cycle with step 2 to step 4 repeated 30 times: 94 C for 2 minutes; 94 () C lor 1 5 seconds; 6() ( C 
5 for 20 seconds; 72 °C for 2 minutes; 72 C for 10 minutes. A 1.25 Kb PCR fragment was 
isolated and cloned into the pCRII-TOPO™ vector (Invitrogen) and completely sequenced 
using the AB1 Big Dye Terminator™ kit (PAL. Biosystem). See, SFQ.ID.NO.: 13. 
hARE-5 (Seq. Id. Nos. 5 & 6) 
I he full length hARH-5 was cloned by PCR using the hARH5 specific primers 
1 0 5'-CAGCGCAGGGTGAAGOCTG AGAGC-3' SEQ.ID.NO.: 69 (sense, 5^ of initiation codon AT(i) 
and 5 , -GGCACGTGGTGTGAGGTGTGGAGG-.v SKQ.ID.NO.:70 (antisense, 3' of stop codon TGA) 
and human genomic DNA as template. TaqPlus Precision™ DNA polymerase (Stratagene) 
was used for the amplification by the following cycle with step 2 to step 4 repeated 35 times: 
96°C, 2 minutes; 96°C, 20 seconds; 58°C, 30 seconds; 72°C, 2 minutes; and 72°C\ 1 0 minutes 
15 A 1.1 Kb PCR fragment of predicated size was isolated and cloned into the 

pCRII-TOPO™ vector (Invitrogen) and completely sequenced (SKQ.ID.NO. :5) using the 17 
DNA Sequenase™ kit (Amsham). 

h. ltARK-4 (Seq. Id. Nos.: 3 & 4) 
The full length hARK-4 was cloned by PCR using the hARH-4 specific primers v 
20CTGGTGTGCTCCATGGCATCCC-r SHQ.lD.NO.:67 (sense, 5' of initiation codon AT(i)and .V- 
G T A A G C C T G C C A G A A C G A G A G G - V SLQ.ID.NO.: 6K (antisensc, 3' of stop codon TGA) and 
human genomic DNA as template. Taq DNA polymerase ( Stratagene) and 5% DMSO was 
used for the amplification by the following cycle with step 2 to step 3 repeated 35 times: 
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94°C, 3 minutes; 94°C, 30 seconds; 59°C, 2 minutes; 72°C, 10 minutes 

A 1 .12 Kb PCR fragment of predicated size was isolated and cloned into the pCRII- 
TOPO™ vector (Invitrogen) and completely sequenced (SEQ.ID.NO.:3) using the T7 DNA 
Sequenase™ kit (Amsham). 
5 i. hARE-3 (Seq.Id.Nos.: 1 & 2) 

The full length hARE-3 was cloned by PCR using the hARE-3 specific primers 5'- 
gatcaagcttCCATCCTACTGAAACCATGGTC-3' SEQ.ID.NO.:65 (sense, lower case nucleotides 
represent Hind III overhang, ATG as initiation codon) and 5'- 
gatcagatctCAGTTCCAATATTCACACCACCGTC-3' SEQ.ID.NO.:66 (antisense, lower case 
10 nucleotides represent Xba I overhang, TCA as stop codon) and human genomic DNA as 
template. TaqPlus Precision™ DNA polymerase (Stratagene) was used for the amplification 
by the following cycle with step 2 to step 4 repeated 35 times: 94°C, 3 minutes; 94°C, 1 
minute; 55°C, 1 minute; 72°C, 2 minutes; 72°C, 10 minutes. 

A 1.3 Kb PCR fragment of predicated size was isolated and digested with Hind III 
15 and Xba I, cloned into the pRC/CMV2 vector (Invitrogen) at the Hind III and Xba I sites and 
completely sequenced (SEQ.ID.NO.:l) using the T7 DNA Sequenase™ kit (Amsham). 
j. HRUP3 (Seq. Id. Nos.:7 & 8) 
The full length hRUP3 was cloned by PCR using the hRUP3 specific primers 5'- 
GTCCTGCCACTTCGAGACATGG-3' SEQ.ID.NO.:7 1 (sense, ATG as initiation codon) and 5'- 
20 GAAACTTCTCTGCCCTTACCGTC-3' SEQ.ID.NO.:72 (antisense, 3 1 of stop codon TAA) and 
human genomic DNA as template. TaqPlus Precision™ DNA polymerase (Stratagene) was 
used for the amplification by the following cycle with step 2 to step 4 repeated 35 times: 
94°C, 3 minutes; 94°C, 1 minute; 58"C, 1 minute; 72°C, 2 minutes; 72°C, 10 minutes 
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A 1 .0 Kb PGR fragment of predicated size w as isolated and cloned into the pCRIl- 

TOPO™ vector (Invitrogen) and completely sequenced (SLQ.ID.NO.: 7)using the 17 DNA 

sequenase kit (Amsham). 

Example 2 
5 Receptor Expression 

Although a variety of cells are available to the art for the expression of proteins, it is 

most preferred that mammalian cells be utilized. The primary reason for this is predicated 

upon practicalities, i.e., utilization of, e.g., yeast cells for the expression of a GPGR, while 

possible, introduces into the protocol a non-mammalian cell which may not (indeed, in the 

10 case of yeast, does not) include the receptor-coupling, genetic-mechanism and secretary 
pathways that have evolved for mammalian systems - thus, results obtained in non- 
mammalian cells, while of potential use, are not as preferred as that obtained from 
mammalian cells. Of the mammalian cells, COS-7. 293 and 293T cells are particularly 
preferred, although the specific mammalian cell utilized can be predicated upon the particular 

15 needs of the artisan. The general procedure for expression of the disclosed GPCRs is as 
follows. 

On day one. 1X1 () 7 293T cells per 150mm plate were plated out. On day two, two 
reaction tubes will be prepared (the proportions to follow for each tube are per plate): tube 
A will be prepared by mixing 20f.ig DNA (e.g.* pCMV vector; pCMV vector with receptor 
2<)cDNA, etc.) in 1.2ml serum free DMKM (Irvine Scientific, Irvine, GA); tube B will be 
prepared by mixing 12()|il lipofectamine (Gibco URL) in 1.2ml serum free DMLM. l ubes 
A and B are admixed by inversions (several times), followed by incubation at room 
temperature for 30-45min. The admixture can be referred to as the "transfection mixture". 
Plated 293 1 cells are washed with 1 XPBS, followed by addition of 1 0ml serum free DMLM. 
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2.4ml of the transfection mixture will then be added to the cells, followed by incubation for 
4hrs at 37 °C/5% C0 2 . The transfection mixture was then be removed by aspiration, followed 
by the addition of 25ml of DMEM/10% Fetal Bovine Serum. Cells will then be incubated 
at 37°C/5% C0 2 . After 72hr incubation, cells can then be harvested and utilized for analysis. 
5 Example 3 

Tissue Distribution of the disclosed human Gpcrs 

Several approaches can be used for determination of the tissue distribution of the 
GPCRs disclosed herein. 

1. Dot-Blot Analysis 

10 Using a commercially available human-tissue dot-blot format, endogenous orphan 

GPCRs were probed for a determination of the areas where such receptors are localized. 
cDNA fragments from the GPCRs of Example 1 (radiolabeled) were (or can be) used as the 
probe: radiolabeled probe was (or can be) generated using the complete receptor cDNA 
(excised from the vector) using a Prime-It II™ Random Primer Labeling Kit (Stratagene, 

15 #300385), according to manufacturer's instructions. A human RNA Master Blot™ 
(Clontech. #7770-1 ) was hybridized with the endogenous human GPCR radiolabeled probe 
and washed under stringent conditions according manufacturer's instructions. The blot was 
exposed to Kodak BioMax™ Autoradiography film overnight at -80°C. Results are 
summarized for several receptors in Table B and C (see Figures 1A and IB for a grid 

20 identifying the various tissues and their locations, respectively). Exemplary dot-blots are 
provided in Figure 2A and 2B for results derived using hCHN3 and I1CHN8, respectively. 

TABLE B 

Orphan GPCR Tissue Distribution 

(highest levels, relative to other tissues in the dot-blot) 
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h(iPCR27 
hARL-l 
hPPRl 
hlUJP3 
hCHN.I 
hCHN9 

hCHNIO 



27- 



Petal brain, Putamen, Pituitary gland. Caudate nucleus 
Spleen, Peripheral leukocytes. Petal spleen 
Pituitary gland, Heart, salivary gland. Small intestine. Testis 
Pancreas 

Fetal brain, Putamen, Occipital cortex 
Pancreas, Small intestine, Liver 
Kidney, Thryoid 
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Orphan GPCR 

hARE-3 
hGPCR3 

hARL-2 
hCHN8 



TABLE C 

Tissue Distribution 
(highest levels, relative to other tissues in the dot-blot) 

Cerebellum left, Cerebellum right, Testis, Accumbens 

Corpus collusum, Caudate nucleus, Liver, Heart, Inter- 
Ventricular Septum 

Cerebellum left, Cerebellum right, Substantia 

Cerebellum left, Cerebellum right, Kidney, Lung 



2. RT-PCR 
15 a. HRUP3 

To ascertain the tissue distribution of HRUP3 mRNA, RT-PCR was performed using 
hRUP3-specific primers and human multiple tissue cDNA panels (MTC\ Clontech) as 
templates. Taq DNA polymerase (Stratagene) was utilized for the PCR reaction, using the 
following reaction cycles in a 4()ul reaction: 94 "C for 2 min; 94 °C for 15 sec; 55 °C for 30 
20 sec; 72°C lor 1 min: 72 1 ' C\ for 10 min. Primers were as follows: 
v -( J ACAGC ) lACCnCiC(^ATCAAG-.r (SLQ-UXNO : 6 1 ; sense) 
5 - ( T( i C ' A C A A TG CC A G TC i A ' T A A G G - 3 1 (SLQ.ID.NO.: 62; antisense). 

20ul of the reaction was loaded onto a 1% agarose gel; results are set forth in Figure 3. 
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As is supported by the data of Figure 3, of the 16 human tissues in the cDNA panel 
utilized (brain, colon, heart, kidney, lung, ovary, pancreas, placenta, prostate, skeleton, small 
intestine, spleen, testis, thymus leukocyte, and liver) a single hRUP3 band is evident only 
from the pancreas. Additional comparative analysis of the protein sequence of hRUP3 with 
5 other GPCRs suggest that hRUP3 is related to GPCRs having small molecule endogenous 
ligand such that it is predicted that the endogenous ligand for hRUP3 is a small molecule, 
b. HRUP4 

RT-PCR was performed using hRUP4 oligo's 8 and 4 as primers and the human 
multiple tissue cDNA panels (MTC, Clontech) as templates. Taq DNA polymerase 

10 (Stratagene) was used for the amplification in a 40ul reaction by the following cycles: 94°C 
for 30 seconds, 94°C for 10 seconds, 55°C for 30 seconds, 72°C for 2 minutes, and 72°C for 
5 minutes with cycles 2 through 4 repeated 30 times. 

20 jil of the reaction were loaded on a 1% agarose gel to analyze the RT-PCR 
products, and hRUP4 mRNA was found expressed in many human tissues, with the strongest 

1 5 expression in heart and kidney, (see. Figure 4). To confirm the authenticity of the PCR 
fragments, a 300 bp fragment derived from the 5' end of hRUP4 was used as a probe for the 
Southern Blot analysis. The probe was labeled with 32 P-dCTP using the Prime-It II™ 
Random Primer Labeling Kit (Stratagene) and purified using the ProbeQuant™ G-50 micro 
columns (Amersham). Hybridization was done overnight at 42° C following a 1 2 hr pre- 

20 hybridization. The blot was finally washed at 65 H C with 0.1 x SSC. The Southern blot did 
confirm the PCR fragments as hRUP4. 



c. hRUP5 
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RT-PCR was performed using the following hRUP5 speeifie primers: 
y-C TGAC I I C TTGTTCC TGGCAGCAGCGG-;/ (Si;g.II).NO.: 63, sense) 
5 , -AGACCAGCO\GGGCACGCTGAAGAGTG-3 1 (St () 1 1). NO.: 64; antisense) 

and the human multiple tissue cDNA panels (MTC, Clontcch) as templates. Taq DNA 
5 polymerase (Stratagenc) was used for the amplification in a 40ul reaction by the following 

cycles: 94 n C for 30 sec, <)4°C for 10 sec, 62°C for 1.5 min, 72°C for 5 min, and with cycles 

2 through 3 repeated 30 times. 20 u.1 of the reaction were loaded on a 1 .5% agarose gel to 

analyze the RT-PCR products, and hRUP5 mRNA was found expressed only in the 

peripheral blood leukocytes {data not shown). 
10 d. hRUP6 

RT-PCR was applied to confirm the expression and to determine the tissue 

distribution of U.RUP6. Oligonucleotides used, based on an alignment of AC005871 and 

GPR66 segments, had the following sequences: 

S'-CCAACACCAGCATCCATGGCATCAAG-S 1 (SEQ.ID.NO.: 73; sense), 
15 5 -GGAGAGTCAGC1 CTGAAACjAATTCAGG-j 1 (SHQ.ID.NO : 74; antisense) 

and the human multiple tissue cDNA panels (MTC, Clontcch) were used as templates. 

PCR was performed using TaqPIus Precision™ polymerase (Stratagenc; manufacturing 

instructions will be followed) in a 40ul reaction by the following cycles: ( )4°C for 30 sec; 

C )4 C, C 5 sec: 66°C for 40 sec, 72°C for 2.5 min, and 72°C for 7 min. Cycles 2 through 4 
20 were repealed 30 times. 

20 ul of the reaction were loaded on a 1.2% agarose gel to analyze the RT-PCR 

products, and a specific 760bp DNA fragment representing hRl 1P6 was expressed 

predominantly in the thymus and with less expression in the heart, kidney, lung, prostate 

small intestine and testis, {sec. Figure 5). 
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It is intended that each of the patents, applications, and printed publications 
mentioned in this patent document be hereby incorporated by reference in their entirety. 

As those skilled in the art will appreciate, numerous changes and modifications 
may be made to the preferred embodiments of the invention without departing from the 
5 spirit of the invention. It is intended that all such variations fall within the scope of the 
invention and the claims that follow. 

Although a variety of Vectors are available to those in the art, for purposes of 
utilization for both endogenous and non-endogenous human GPCRs, it is most preferred 
that the Vector utilized be pCMV. This vector was deposited with the American Type 
10 Culture Collection (ATCC) on October 13, 1998 (10801 University Blvd., Manassas, VA 
201 10-2209 USA) under the provisions of the Budapest Treaty for the International 
Recognition of the Deposit of Microorganisms for the Purpose of Patent Procedure. The 
DNA was tested by the ATCC and determined to be. The ATCC has assigned the 
following deposit number to pCMV: ATCC #203351 . 
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CLAIMS 



What is claimed is: 

1 • A cDNA encoding a human G protein-coupled receptor comprising 
SIZQ.ID.NO.: 1. 

2. A human CI protein-coupled receptor encoded by the cDNA of 
SEQ.ID.NO.: 1 comprising SEQ.ID.NO.: 2. 

3- A Plasmid comprising a Vector and the cDNA of SEQ.ID.NO.: 1 

4. A Most Cell comprising the Plasmid of claim 3. 

5. A cDNA encoding a human G protein-coupled receptor comprising 
10 SI-Q.ID.NO.: 3. 

6. A human Ci protein-coupled receptor encoded by the cDNA of 
SEQ.ID.NO.: 3 comprising SEQ.ID.NO.: 4. 

7- A Plasmid comprising a Vector and the cDNA of SEQ.ID.NO.:3. 

8- A Most Cell comprising the Plasmid of claim 7. 

9. A cDNA encoding a human G protein-coupled receptor comprising 
SEQ.ID.NO.: 5. 

1 0. A human G protein-coupled receptor encoded by the cDNA of 
SEQ.ID.NO.: 5 comprising SEQ.ID.NO.: 6. 

11. A Plasmid comprising a Vector and the cDNA of SEQ.ID.NO.:?. 
mprising the Plasmid of claim 1 1 . 



11 12. A 1 lost Cell co 



13. A cDNA encoding a human G protein-coupled receptor comprising 
SEQ.ID.NO.: 7. 
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14. A human G protein-coupled receptor encoded by the cDNA of 
SEQ.ID.NO.: 7 comprising SEQ.ID.NO.: 8. 

15. A Plasmid comprising a Vector and the cDNA of SEQ.ID.NO. :7. 

16. A Host Cell comprising the Plasmid of claim 15. 

5 1 7. A cDNA encoding a human G protein-coupled receptor comprising 

SEQ.ID.NO.: 9. 

18. A human G protein-coupled receptor encoded by the cDNA of 
SEQ.ID.NO.: 9 comprising SEQ.ID.NO.: 10. 

19. A Plasmid comprising a Vector and the cDNA of SEQ.ID.NO. :9. 
10 20. A Host Cell comprising the Plasmid of claim 1 9. 

21. A cDN A encoding a human G protein-coupled receptor comprising 
SEQ.ID.NO.: 11. 

22. A human G protein-coupled receptor encoded by the cDNA of 
SEQ.ID.NO.: 11 comprising SEQ.ID.NO.:12. 

15 23 - A Plasmid comprising a Vector and the cDNA of SEQ.ID.NO. :1 1. 

24. A Host Cell comprising the Plasmid of claim 23. 

25. A cDNA encoding a human G protein-coupled receptor comprising 
SEQ.ID.NO.: 13. 

26. A human G protein-coupled receptor encoded by the cDNA of 
20 SEQ.ID.NO.: 13 comprising SEQ.ID.NO.: 14. 

27. A Plasmid comprising a Vector and the cDNA of SEQ.ID.NO.: 13. 

28. A Host Cell comprising the Plasmid of claim 27. 

29. A cDNA encoding a human G protein-coupled receptor comprising 
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SKQ.ID.NO.: 15. 

30. A human Ci protein-coupled receptor encoded by the cDNA of 
SKQ.ID.NO.: 15 comprising SKQ.ID.NO.: 16. 

31 . A Plasmid comprising a Vector and the cDNA of SKQ.ID.NO.: 1 5. 
5 32. A I lost Cell comprising the Plasmid of claim 3 1 . 

33. A cDNA encoding a human (J protein-coupled receptor comprising 
SKQ.ID.NO.: 17. 

34. A human G protein-coupled receptor encoded by the cDNA of 
SKQ.ID.NO.: 17 comprising SKQ.ID.NO.: 18. 

10 35. A Plasmid comprising a Vector and the cDNA of SKQ.ID.NO.:! 7. 

36. A Host Cell comprising the Plasmid of claim 35. 

37. A cDNA encoding a human G protein-coupled receptor comprising 
SKQ.ID.NO.: 19. 

38. A human G protein-coupled receptor encoded by the cDNA of 
15 SKQ.ID.NO.: 19 comprising SKQ.ID.NO.: 20. 

39. A Plasmid comprising a Vector and the cDNA of SKQ.ID.NO.: 1 9. 

40. A Most Cell comprising the Plasmid of claim 39. 

41 A cDNA encoding a human G protein-coupled receptor comprising 
SKQ.ID.NO.: 21. 

20 42. A human G protein-coupled receptor encoded by the cDNA of 

SKQ.ID.NO.: 21 comprising SKQ.ID.NO.: 22. 

43. A Plasmid comprising a Vector and the cDNA of SKQ.ID.NO. :2 1 . 

44. A Host Cell comprising the Plasmid of claim 43. 
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45. A cDNA encoding a human G protein-coupled receptor comprising 
SEQ.ID.NO.: 23. 

46. A human G protein-coupled receptor encoded by the cDNA of 
SEQ.ID.NO.: 23 comprising SEQ.ID.NO.: 24. 

5 47. A Plasmid comprising a Vector and the cDNA of SEQ.ID.NO.: 23. 

48. A Host Cell comprising the Plasmid of claim 47. 

49. A cDNA encoding a human G protein-coupled receptor comprising 
SEQ.ID.NO.: 25. 

50. A human G protein-coupled receptor encoded by the cDNA of 
10 SEQ.ID.NO.: 25 comprising SEQ.ID.NO.: 26. 

51 . A Plasmid comprising a Vector and the cDNA of SEQ.ID.NO. :25. 

52. A Host Cell comprising the Plasmid of claim 51. 

53. A cDNA encoding a human G protein-coupled receptor comprising 
SEQ.ID.NO.: 27. 

15 54 • A human G protein-coupled receptor encoded by the cDNA of 

SEQ.ID.NO.: 27 comprising SEQ.ID.NO.: 28. 

55. A Plasmid comprising a Vector and the cDNA of SEQ.ID.NO. :27. 

56. A Host Cell comprising the Plasmid of claim 55. 

57. A cDNA encoding a human G protein-coupled receptor comprising 
20 SEQ.ID.NO.: 29. 

58. A human G protein-coupled receptor encoded by the cDNA of 
SEQ.ID.NO.: 29 comprising SEQ.ID.NO.: 30. 

59. A Plasmid comprising a Vector and the cDNA of SEQ.ID.NO. :29. 
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60. A 1 lost Cell comprising the Plasmid of claim 59. 

61 . A cDNA encoding a human (i protein-coupled receptor comprising 
SEQ.ID.NO.: 31. 

62. A human G protein-coupled receptor encoded by the cDNA of 
5 SEQ.ID.NO.: 31 comprising SEQ.ID.NO.: 32. 

63. A Plasmid comprising a Vector and the cDNA of SEQ.ID.NO.:3 1 

64. A Host Cell comprising the Plasmid of claim 63. 

65. A cDNA encoding a human G protein-coupled receptor comprising 
SEQ.ID.NO.: 33. 

10 66. A human G protein-coupled receptor encoded by the cDN A of 

SEQ.ID.NO.: 33 comprising SEQ.ID.NO.: 34. 

67. A Plasmid comprising a Vector and the cDNA of SEQ.ID.NO.:33. 

68. A Host Cell comprising the Plasmid of claim 67. 

69. A cDNA encoding a human G protein-coupled receptor comprising 
15 SEQ.ID.NO.: 35. 

70. A human G protein-coupled receptor encoded by the cDNA of 
SEQ.ID.NO.: 35 comprising SEQ.ID.NO.: 36. 

71 A Plasmid comprising a Vector and the cDNA of SEQ.ID.NO. :35. 
72. A Host Cell comprising the Plasmid of claim 71 . 

W 73. A cDNA encoding a human (i protein-coupled receptor comprising 

SEQ.ID.NO.: 37. 

74. A human (i protein-coupled receptor encoded by the cDNA of 
SEQ.ID.NO.: 37 comprising SEQ.ID.NO.: 38. 
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A Plasmid comprising a Vector and the cDNA of SEQ.ID.NO.:37. 
A Host Cell comprising the Plasmid of claim 75. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION : 

U) APPLICANT: Chen, Ruoping 
Dang, Huong T. 
Liaw, Chen W. 
Lin, I -Lin 

<ii> TITLE OF ZWENTXON: H Uma n o rphan G Protein - C oup led Receptor, 

(iii) NUMBER OF SEQUENCES: 74 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Arena Pharmaceuticals Inc 

<B) STREET : 6166 Nancy Ridge Drive ' 

(C) CITY: San Diego 

(D) STATE: CA 

(E) COUNTRY: USA 

(F) ZIP : 92121 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

(va ) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY/ AGENT INFORMATION : 

(A) NAME: Burgoon , Richard P. 

(B) REGISTRATION NUMBER: 34,787 

(ix) TELECOMMUNICATION INFORMATION • 
(A) TELEPHONE: (858)453-7200 
(BJ TELEFAX: (858)453-7210 

(2) INFORMATION FOR SEQ ID NO : 1 : 

U) SEQUENCE CHARACTERISTICS - 
. s (A) LENGTH: 1260 base pairs 

~'" < B > TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY; linear 

(ID MOLECULE TYPE: DNA (genomic) 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 
-10 ATGGTCTTCT CGGCAGTGTT GACTGCGTTC CATACCGGGA CATCCAACAC AACATTTGTC 6 0 
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GTGTATGAAA ACACCTACAT GAATATTACA CTCCCTCCAC CATTCCAGCA TCCTGACCTC 12 0 

AGTCCATTGC TTAGATATAG TTTTGAAACC ATGGCTCCCA CTGGTTTGAG TTCCTTGACC 18 0 

GTGAATAGTA CAGCTGTGCC CACAACACCA GCAGCATTTA AGAGCCTAAA CTTGCCTCTT 24 0 

CAGATCACCC TTTCTGCTAT AATGATATTC ATTCTGTTTG TGTCTTTTCT TGGGAACTTG 3 00 

5 GTTGTTTGCC TCATGGTTTA CCAAAAAGCT GCCATGAGGT CTGCAATTAA CATCCTCCTT 36 0 

GCCAGCCTAG CTTTTGCAGA CATGTTGCTT G C AGTGCTG A ACATGCCCTT TGCCCTGGTA 42 0 

ACTATTCTTA CTACCCGATG GATTTTTGGG AAATTCTTCT GTAGGGTATC TGCTATGTTT 48 0 

TTCTGGTTAT TTGTGATAGA AGGAGTAGCC ATCCTGCTCA T C ATTAG CAT AGATAGGTTC 54 0 

CTTATTATAG TCCAGAGGCA GGATAAG CT A AACCCATATA GAGCTAAGGT TCTGATTGCA 6 00 

10GTTTCTTGGG CAACTTCCTT TTGTGTAGCT TTTCCTTTAG CCGTAGGAAA CCCCGACCTG 66 0 

C AG ATAC CTT CCCGAGCTCC CCAGTGTGTG TTTGGGTACA CAACCAATCC AGGCTAC C AG 72 0 

GCTTATGTGA TTTTGATTTC TCTCATTTCT TTCTTCATAC CCTTCCTGGT AATACTGTAC 78 0 

TCATTTATGG GCATACTCAA CACCCTTCGG CACAATGCCT TGAGGATCCA TAGCTACCCT 84 0 
GAAGGTATAT GCCTCAGCCA GGCCAGCAAA CTGGGTCTCA TGAGTCTGCA GAGACCTTTC 9 00 

1 5 CAGATGAGCA TTGACATGGG C TTT AAAAC A CGTGCCTTCA CCACTATTTT GATTCTCTTT 96 0 

GCTGTCTTCA TTGTCTGCTG GGCCCCATTC ACCACTTACA GCCTTGTGGC 
AAC ATT C AGT 102 0 

AAGCACTTTT ACTATCAGCA CAACTTTTTT GAG ATT AG C A CCTGGCTACT GTGGCTCTGC1 08 0 

TACCTCAAGT CTGCATTGAA TCCGCTGATC TACTACTGGA GGATTAAGAA ATT C C ATG AT 114 0 

20 GCTTGCCTGG ACATGATGCC TAAGTCCTTC AAGTTTTTGC CGCAGCTCCC TGGTCACACA1 2 00 

AAGCGACGGA TACGTCCTAG TGCTGTCTAT GTGTGTGGGG AACATCGGAC GGTGGTGTGA12 6 0 
(3) INFORMATION FOR SEQ ID NO : 2 : 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 419 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 
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Met Val Phe Ser Ala Val Leu Thr Ala Phe His Thr Gly Thr Ser Asn 
15 10 15 
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Th> Thr The Val Val Tyr Glu Asn Thr Tyr Met Asn He Thr Lou Pro 

2 0 2 5 3 0 

Pro Pro Phe Gin His Pro Asp Leu Ser Pro Leu Leu Arg Tyr Ser Pho 
3 5 4 0 4 5 

Glu Thr Met Ala Pro Thr Gly Leu Ser Ser Leu Thr Val Asn Ser Thr 
SO 55 60 

Ala Val Pro Thr Thr Pro Ala Ala Phe Lys Ser Leu Asn Leu Pro Leu 
65 70 75 80 

Gin He Thr Leu Ser Ala He Met He Phe He Leu Phe Val Ser Phe 



10 8 5 9 0 
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Leu Gly Asn Leu Val Val Cys Leu Met Val Tyr Gin Lys Ala Ala Met 

100 105 " iio 

Arg Ser Ala He Asn He Leu Leu Ala Ser Leu Ala Phe Ala Asp Met 

115 120 125 

I> Leu Leu Ala Val Leu Asn Met Pro Phe Ala Leu Val Thr He Leu Thr 

130 135 140 

Thr Arg Trp He Phe Gly Lys Phe Phe Cys Arg Val Ser Ala Met Phe 

H5 150 155 160 

Phe Trp Leu Phe Val He Glu Gly Val Ala He Leu Leu He He Ser 

20 165 170 175 

lie Asp Arg Phe Leu He He Val Gin Arg Gin Asp Lys Leu Asn Pro 

180 185 190 

Tyr Arg Ala Lys Val Leu He Ala Val Ser Trp Ala Thr Ser Phe Cys 

195 200 205 

-5 Val Ala Phe Pro Leu Ala Val Gly Asn Pro Asp Leu Gin lie Pro Ser 

210 215 220 

Arg Ala Pro Gin Cys Val Phe Gly Tyr Thr Thr Asn Pro Gly Tyr Gin 

22h 230 235 240 

Ala Tyr- Val He Leu He Ser Leu He Ser Phe Phe He Pro Phe Leu 

^4 5 2 5 0 25 5 

Val Ho Leu Tyr Ser Phe Met Gly He Leu Asn Thr Leu Arg His Asn 

260 265 270 

A-a Leu Arg lie His Ser Tyr Pro (Liu Gly He Cys Leu Ser Gin Ala 

215 2 80 285 

"° Lys Leu Gly Leu Met Sei Leu Gin Arg Pro Phe Gin Met Ser He 

290 295 300 

Asp Met Gly Plie Lys Thr Arg Ala Phe Thr Thr He Leu lie Leu Phe 
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ATGTTAGCCA ACAGCTCCTC AACCAACAGT 
25 ACCCACCGCC TGCACTTGGT GGTCTACAGC 
GCGCTAGCCC TCTGGGTCTT CCTGCGCGCG 
ATGTGTAACC TGGCGGCCAG CGACCTGCTC 
TACTACGCAC TGCACCACTG GCCCTTCCCC 
TTCCAGATGA ACATGTACGG CAGCTGCATC 
30 GCCGCCATCG TGCACCCGCT GCGACTGCGC 
CTCTGCCTGG GCGTGTGGGC GCTCATCCTG 
AGGCCCTCGC GTTGCCGCTA CCGGGACCTC 
GACGAGCTGT GGAAAGGCAG GCTGCTGCCC 
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TCTGTTCTCC CGTGTCCTGA CTACCGACCT 6 0 

TTGGTGCTGG CTGCCGGGCT CCCCCTCAAC 12 0 

CTGCGCGTGC ACTCGGTGGT GAGCGTGTAC 18 0 

TTCACCCTCT CGCTGCCCGT TCGTCTCTCC 24 0 

GACCTCCTGT GCCAGACGAC GGGCGCCATC 300 

TTCCTGATGC TCATCAACGT GGACCGCTAC 36 0 

CACCTGCGGC GGCCCCGCGT GGCGCGGCTG 42 0 

GTGTTTGCCG TGCCCGCCGC CCGCGTGCAC 4 80 

GAGGTGCGCC TATGCTTCGA GAGCTTCAGC 54 0 

CTCGTGCTGC TGGCCGAGGC GCTGGGCTTC 6 00 



Ala Val Phe lie Val Cys Trp Ala Pro Phe Thr Thr Tyr Ser Leu Val 
325 330 335 

Ala Thr Phe Ser Lys His Phe Tyr Tyr Gin His Asn Phe Phe Glu lie 
340 345 350 

Ser Thr Trp Leu Leu Trp Leu Cys Tyr Leu Lys Ser Ala Leu Asn Pro 
355 360 365 

Leu lie Tyr Tyr Trp Arg lie Lys Lys Phe His Asp Ala Cys Leu Asp 
370 375 380 

Met Met Pro Lys Ser Phe Lys Phe Leu Pro Gin Leu Pro Gly His Thr 
385 390 395 400 

Lys Arg Arg lie Arg Pro Ser Ala Val Tyr Val Cys Gly Glu His Arg 
405 410 415 

Thr Val Val 



INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1119 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

<ii) MOLECULE TYPE: DNA (genomic) 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 



WU5UULIU. tffUU uuuiLODiic.r 



WO 00/31258 

PCT/US99/2368: 

- 5 - 

CTGCTGCCCC TGGCGGCGGT — CTCG — CCCAC TCTTCTGGAC CCTGCCCCCC 660 
CCCGACGCCA CGCAGAGCCA GCGGCGGCGG AAGACCGTGC GCCTCCTCCT GGCTAACCTC 
CTCATCTTCC TGCTGTGCTT CGTGCCCTAC AACAGCACGC TGGCGGTCTA CGCGCTCCTC 780 
CGGAGCAAGC TGGTGGCGGC CAGCGTGCCT GCCCGCGATC GCGTGCGCGG GGTGCTGATG 3,0 
5 GTGATGGTGC TGCTGGCCGG CGCCAACTGC GTGCTGGACC CGCTGGTGTA CTACTTTAGC SCO 
GCCGAGGGCT TCCGCAACAC CCTGCGCGCC CTGGGCACTC CGCACGGGGC CAGGACCTCG 960 
GCCACCAACC GGAGGCGGGC GGCGCTCGCG CAATCCGAAA CGTCCCCCGT 

CCCACGAGGC CGGATGCCGC CAGTCAGGGG CTGCTCCGAC CCTCCGACTC CCACTCTCTG 10 6 0 
TCTTCCTTCA CACAGTGTCC CCAGGATTCC GCCCTCTGA 

1119 

10(5) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 372 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

Met Leu Ala Asn Ser Ser Ser Thr Asn S er Ser Val Leu Pro Cys Pro 



20 



10 15 



Asp T yr Ar g Pro Thr „ is Arg Leu H±8 ^ ^ ^ ^ ^ 

^ b 30 

Leu Ala Ala Gly Leu Pro Leu Asn Ala Leu A!a Leu Trp Val Phe Leu 

40 45 

"'^ so 3 A1 " 9 Val Val S « Val Tyr Met Cys Asn Leu 

55 60 

!T Thr LGU S « L - *™ Val Arg Leu Ser 

8 0 



70 ?5 



Tyr Tyr Ala Leu His His Trp Pro Phe Pro A, 



90 



p Lou Leu Cyi; Gin Thi 



95 



Thr Gly Ala lie Phe Gin Met Asn Met Tyr Gly Ser Cys II, 



100 



105 



Cys He Phe Leu 
110 



M6t Hi M« lie Val His Pro Leu Arg 

1^0 } n [- 
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Leu Arg His Leu Arg Arg Pro Arg Val Ala Arg Leu Leu Cys Leu Gly 
130 135 140 

Val Trp Ala Leu He Leu Val Phe Ala Val Pro Ala Ala Arg Val His 
145 150 155 i 6 o 

5 Arg Pro Ser Arg Cys Arg Tyr Arg Asp Leu Glu Val Arg Leu Cys Phe 

165 170 175 

Glu Ser Phe Ser Asp Glu Leu Trp Lys Gly Arg Leu Leu Pro Leu Val 
180 185 190 

Leu Leu Ala Glu Ala Leu Gly Phe Leu Leu Pro Leu Ala Ala Val Val 
10 195 200 205 

Tyr Ser Ser Gly Arg Val Phe Trp Thr Leu Ala Arg Pro Asp Ala Thr 
210 215 220 

Gin Ser Gin Arg Arg Arg Lys Thr Val Arg Leu Leu Leu Ala Asn Leu 
225 230 235 240 

15 Val He Phe Leu Leu Cys Phe Val Pro Tyr Asn Ser Thr Leu Ala Val 

245 250 255 

Tyr Gly Leu Leu Arg Ser Lys Leu Val Ala Ala Ser Val Pro Ala Arg 
260 265 270 

Asp Arg Val Arg Gly Val Leu Met Val Met Val Leu Leu Ala Gly Ala 
20 275 280 285 

Asn Cys Val Leu Asp Pro Leu Val Tyr Tyr Phe Ser Ala Glu Gly Phe 
2 ^0 295 300 

Arg Asn Thr Leu Arg Gly Leu Gly Thr Pro His Arg Ala Arg Thr Ser 
305 310 315 320 

25 Ala Thr Asn Gly Thr Arg Ala Ala Leu Ala Gin Ser Glu Arg Ser Ala 

325 330 335 

Val Thr Thr Asp Ala Thr Arg Pro Asp Ala Ala Ser Gin Gly Leu Leu 
340 345 35Q 

Arg Pro Ser Asp Ser His Ser Leu Ser Ser Phe Thr Gin Cys Pro Gin 
30 355 360 365 

Asp Ser Ala Leu 
370 

(6) INFORMATION FOR SEQ ID NO : 5 : 

( l ; SEQUENCE CHARACTERISTICS : 
35 (A) LENGTH: 1107 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



H1JUUUIU tiiu m r 



WO 00/3 125X 

I , CT/US9')/23687 

- 7 - 

(1J| MOLECULE TYPE : DNA (genomic) 

(XX) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

ATGG CCAACT CCACAGGGCT GAACGCCTCA GAAGTCGCAG GCTCGTTGGG GTTGATCCTG 60 

OCACCXGTCG TGGAGGTGGG GGCACTGCTG GGCAACGGCG CG CTGCTGGT CGTGGTGCTG 120 

-SCGCACGCCGG GACTGCGCGA CCCCCTCTAC C.GGCGCACC TGTGCGTCGT GGACCTGCTG 180 

GCGGCCGCCT CCATCATGCC GCTGGGCCTG C™ - 

t iuu , LUU , t CGCCGCCCGG GCTGGGCCGC 24 0 

OTGCGCC.GG .CCCCCCOCC ATCCCGCGCC cc T ccc TT cc TCT ccgccgc TCTCCT CCCC 300 
GCCTGCACCC TCGGGGTGGC CGCACTTGGC CTGGCACCCT ACCCCCTCAT CCTGCACCCG 3S„ 
CTGCGCCCAC CCTCCCCCCC CCCCCCTG.C « T CA CCCCCGTC.G GGCCGCGGCC „. 
GGACTGCTGG GCCCGCTC.C CC.CC.CCCC CCGCCGCCCC CACCGCCCCC TGCXCCTCCT „„ 
CGCTGC.CGC TCC.GCC.GG GGCCCCCGG CCCTTCCGGC CGCTC.GGGC CC3GCTCCCC M „ 
TTCGCGCTGC CCGCCCTCCT CCTCCTCGGC GCCTACGGCC GCATCTTCGT GGTGGCGCGT 60 0 
CGCGCTGCCC TG AGGCCCCC ACGGCCGGCG CGCGGGTCCC GACTCCGCTC GGACTCTCTG 86 „ 
GATAGCCGCC TTTCCATCTT GCCGCCGCTC CGGCCTCGCC TGCCCGGGGG CAAGGCGGCC „, 
^CGCCCCAG CGCTGGCCGT GGGCCAATTT GCAGCC T G CT GGCTGCCTTA TGGCTCCGCG 7M 
TGCCTGGCGC CCGCAGCGCG GGCCGCGGAA GCCGAAGCGG CTGTCACCTG GG.CGCCXAC .„ 
TCGGCCTTCG CGGCTCACCC CTTCCTGTAC GGGCTGCTGC AGCGCCCCGT GCGCTTGGCA S „0 
CGGGCCGCC TCTCTCGCCG XGCACXGGCT GGACCTGTGC GCGCCTGCAC TCCCCAACCC S6 „ 
TGGGACCGGC GGGCACTCTT GCAATCCCTC CAGAGACCCC CAGAGGGCCC TGCCGTAGGC102 0 
CCTTCTGAGG CTCCAGAACA GACGCCCGAG TTGGCAGGAG GGCGGAGCCC CGCATACCAC^o 
GGGCCACCTG AGAGTTCTCT CTCCTGA 

1107 

<V) INFORMATION FOR SEQ ID NO : 6 : 

U) SEQUENCE CHARACTERISTICS - 
^ (A) LENGTH : 368 ammo acids 

< B > TYPE : amino acid 

(C) STRANDEDNES5 : 

(D) TOPOLOGY: not relevant 

MOLECULE TYPE: protein 

SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 



t*NS[)(*;i[j . Wt , 0031^!>HA;> I 
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Met Ala Asn Ser Thr Gly Leu Asn Ala Ser Glu Val Ala Gly Ser Leu 

Gly Leu lie Leu Ala Ala Val Val Glu Val Gly Ala Leu Leu Gly Asn 
20 25 30 

5 Gly Ala Leu Leu Val Val Val Leu Arg Thr Pro Gly Leu Arg Asp Ala 

35 40 45 

Leu Tyr Leu Ala His Leu Cys Val Val Asp Leu Leu Ala Ala Ala Ser 
50 55 60 

He Met Pro Leu Gly Leu Leu Ala Ala Pro Pro Pro Gly Leu Gly Ara 
10 65 70 75 80 9 

Val Arg Leu Gly Pro Ala Pro Cys Arg Ala Ala Arg Phe Leu Ser Ala 
85 90 95 

Ala Leu Leu Pro Ala Cys Thr Leu Gly Val Ala Ala Leu Gly Leu Ala 
100 105 no 

15 Arg Tyr Arg Leu He Val His Pro Leu Arg Pro Gly Ser Arg Pro Pro 

115 120 125 

Pro Val Leu Val Leu Thr Ala Val Trp Ala Ala Ala Gly Leu Leu Gly 
130 135 14Q 

Ala Leu Ser Leu Leu Gly Pro Pro Pro Ala Pro Pro Pro Ala Pro Ala 
20 145 150 " 155 160 

Arg Cys Ser Val Leu Ala Gly Gly Leu Gly Pro Phe Arg Pro Leu Trp 
165 170 175 

Ala Leu Leu Ala Phe Ala Leu Pro Ala Leu Leu Leu Leu Gly Ala Tyr 
180 185 190 

25 Gly Gly He Phe Val Val Ala Arg Arg Ala Ala Leu Arg Pro Pro Arg 

195 200 205 

Pro Ala Arg Gly Ser Arg Leu Arg Ser Asp Ser Leu Asp Ser Arg Leu 
210 215 220 

Ser He Leu Pro Pro Leu Arg Pro Arg Leu Pro Gly Gly Lys Ala Ala 
30 225 230 235 240 

Leu Ala Pro Ala Leu Ala Val Gly Gin Phe Ala Ala Cys Trp Leu Pro 
245 250 255 

Tyr Gly Cys Ala Cys Leu Ala Pro Ala Ala Arg Ala Ala Glu Ala Glu 
260 265 270 



j3 



Ala Ala Val Thr Trp Val Ala Tyr Ser Ala Phe Ala Ala His Pro Phe 
275 280 285 



Leu Tyr Gly Leu Leu Gin Arg Pro Val Arg Leu Ala Leu Gly Arg 



Leu 



WJUULIU IH0 HIT" 
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290 oqr - 

*^ 300 

Arg A , 9 Ala Leu Pro Gr y Pro Val Arg Ala Cys Thr ^ ^ ^ 

U IIS 

Jlb 320 

T1 " P ^ LSU LCU c ya Leu Gln Arg Pro Pro Qlu G 

5 330 335 

Pro Ala Val Gly Pro q^v- pi,, *i 

Y Iro Ser Glu Ala Pro Glu Gin Thr Pro Glu Leu Ala 

345 350 

Gly Gly Arq Ser Pro sl a T ,.„ „T r ~, 

3 55 " " ^ Y Pr ° Pr ° Glu Ser Ser Leu Ser 



360 365 



10 (8) INFORMATION FOR SEQ ID NO : 7 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 1008 base pairs 

(B) TYPE: nucleic acid 
(c > STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 
ATGGAATCAT CTTTCTCATT TGGAGTGATC CTTGCTGTCC TGGCCTCCCT CATCATTGCT 60 
ACTAACACAC TAGTGGCTGT GGCTGTGCTG CTGTTGATCC ACAAGAATGA TGGTGTCAGT 120 
20 CTCTGCTTCA CCTTGAATCT GGCTGTGGCT GACACCTTGA TTGGTGTGGC CATCTCTGGC 180 
CTACTCACAG ACCAGCTCTC CAGCCCTTCT CGGCCCACAC AGAAGACCCT GTGCAGCCTG 240 
CGGATGG CAT TTGTCACTTC CTCCGCAGCT GCCTCTGTCC TCACGGTCAT GCTGATCACC 300 
TTTGACAGGT ACCTTGCCAT C AAG C AG C C C TTCCGCTACT TGAAGATCAT GAGTGGGTTC 3 60 
GTGGCCGGGG CCTGCATTGC CGGGCTGTGG TTAGTGTCTT ACCTCATTGG CTTCCTCCCA 420 
^CTCGGAATCC CCATGTTCCA GCAGACTGCC TACAAAGGGC AGTGCAGCTT CTTTGCTGTA 480 
TTTCACCCTC ACTTCGTGCT GACCCTCTCC TGCGTTGGCT TCTTCCCAGC CATGCTCCTC 54 0 
TTTGTCTTCT TCTACTGCGA CATGCTCAAG ATTGCCTCCA TGCACAGCCA GCAGATTCGA 6 00 
AAGATGGAAC ATGCAGGAGC CATGGCTGGA GGTTATCGAT CCCCACGGAC TCCCAGCGAC 66C 
TTCAAAGCTC TCCGTACTGT GTCTGTTCTC ATTGGGAGCT TTGCTCTATC CTGGACCCCC 720 
OTTCCTTATCA CTG3CATTGT GCAGGTGGCC TGCCAGGAGT GTCACCTCTA CCTAGTGCTG 780 
GAACGGTACC TGTGGCTGCT CGGCGTGGGC AACTCCCTGC TCAACCCACT CATCTATGCC G40 



WO 00/31258 
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TATTGG C AG A AGGAGGTGCG ACTGCAGCTC TACCACATGG CCCTAGGAGT GAAGAAGGTG 900 

CTCACCTCAT TCCTCCTCTT TCTCTCGGCC AGGAATTGTG G CC C AG AG AG GCCCAGGGAA 960 

AGTTCCTGTC ACATCGTCAC TATCTCCAGC TCAGAGTTTG ATGGCTAA 10 08 
(9) INFORMATION FOR SEQ ID NO : 8 : 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 335 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 



10 (ii) MOLECULE TYPE: 



30 



protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

Met Glu Ser Ser Phe Ser Phe Gly Val He Leu Ala Val Leu Ala Ser 
1 5 10 15 

Leu He He Ala Thr Asn Thr Leu Val Ala Val Ala Val Leu Leu Leu 
15 2 0 25 30 

He His Lys Asn Asp Gly Val Ser Leu Cys Phe Thr Leu Asn Leu Ala 
35 40 45 

Val Ala Asp Thr Leu He Gly Val Ala He Ser Gly Leu Leu Thr Asp 
50 55 6 o 

20 Gin Leu Ser Ser Pro Ser Arg Pro Thr Gin Lys Thr Leu Cys Ser Leu 

65 70 75 80 

Arg Met Ala Phe Val Thr Ser Ser Ala Ala Ala Ser Val Leu Thr Val 
85 90 95 

Met Leu He Thr Phe Asp Arg Tyr Leu Ala He Lys Gin Pro Phe Arg 
25 100 105 no 

Tyr Leu Lys He Met Ser Gly Phe Val Ala Gly Ala Cys He Ala Gly 
115 120 125 

Leu Trp Leu Val Ser Tyr Leu He Gly Phe Leu Pro Leu Gly He Pro 
130 135 140 



Met Phe Gin Gin Thr Ala Tyr Lys Gly Gin Cys Ser Phe Phe Ala Val 

145 150 155 160 

Phe His Pro His Phe Val Leu Thr Leu Ser Cys Val Gly Phe Phe Pro 

165 170 175 

Ala Met Leu Leu Phe Val Phe Phe Tyr Cys Asp Met Leu Lys He Ala 
180 185 190 
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Sex Met His Ser Gin Gin II 

200 



195 - 6 ^ LyS M ° r G]u "is Ala Gly Ala Met 

205 



My Tl "' ~ £ «- Ti " ^ »» Lyv „. ^ 

220 

«| Thr V al .„ v.! Ljj, Ue oly Ser Pl , e M . Leu E „ Trp Thi ^ 

2 3 5 

™» n. Thr „. w aln val M . cys oin mu ^° 

250 255 
Tyr v al £ 01u Ar9 Tyr L „ u ^ ^ u v ^ o ^ ^ 



265 170 



Leu Leu Asn Pr 
275 



15 



o Leu lie Tyr A1 Tyr Trp Gln Lys ^ 

280 285 

Gin Leu Tyr His Met Ala Leu Gly Val Lv^ I v<^ v i T 

290 OQ _ Y Val Ly ^ L V S v ^ Leu Thr Ser Phe 

300 

L.» - Phe L «„ ser „ srg As „ cys Gly pro 

315 

15 320 



Ser Ser Cys His He Val Thr lie S 

330 



325 Sr Ser Glu Phe Asp Gly 

335 

HO) INFORMATION FOR SEQ ID NO : 9 : 



-° (i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 1413 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

25 (n) MOLECULE TYPE: DNA (genomic) 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

ATGGACACTA CCATGGAAGC TGACCTGGGT GCCACTGGCC ACAGGCCCCG CACAG AG CTT 6 0 

GATGATGAGG ACTCCTACCC CCAAGGTGGC TGGGACACGG TCTTCCTGGT GGCCCTGCTG 

CTCCTTGGGC TGCCAGCCAA TGGGTTGATG GCGTGGCTGG CCGGCTCCCA GGCCCGGCAT ISO 

-OGAGCTGGCA CGCGTCTGGC GCTGCTCCTG CTCACCCTGO CCCTCTCTGA CTTCTTGTTC ^ 4 0 

CTGGCAGCAG CGGCCTTCCA GATCCTAGAG ATCCGGCATG GGGGACACTG GCCGCTGGGG 300 

ACAGCTGCCT GCCGCTTCTA CTACTTCCTA TGGGGCGTGT CCTACTCCTC CGGCCTCTTC 3,0 

CTGCTGGCCG CCCTCAGCCT CGACCGCTGC CTGCTGGCGC TGTGCCCACA CTGGTACCCT 

O-CACCGCC CAGTCCGCCT GCCCCTCTGG GTCTGCGCCG GTGTCTGGGT GCTGGCCACA ,30 



hnsdockj - wo <m>ji;>:>ha:> \ 
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CTCTTCAGCG TGCCCTGGCT GGTCTTCCCC GAGGCTGCCG TCTGGTGGTA CGACCTGGTC 54 0 

ATCTGCCTGG ACTTCTGGGA CAGCGAGGAG CTGTCGCTGA GGATGCTGGA GGTCCTGGGG 6 00 

GGCTTCCTGC CTTTCCTCCT GCTGCTCGTC TGCCACGTGC TCACCCAGGC CACAGCCTGT 66 0 

CGCACCTGCC ACCGCCAACA GCAGCCCGCA GCCTGCCGGG GCTTCGCCCG TGTGGCCAGG 72 0 

5 ACCATTCTGT CAGCCTATGT GGTCCTG AGG CTGCCCTACC AGCTGGCCCA GCTGCTCTAC 780 

CTGGCCTTCC TGTGGGACGT CTACTCTGGC TACCTGCTCT GGGAGGCCCT GGTCTACTCC 84 0 

GACTACCTGA TCCTACTCAA CAGCTGCCTC AGCCCCTTCC TCTGCCTCAT GGCCAGTGCC 900 

GACCTCCGGA CCCTGCTGCG CTCCGTGCTC TCGTCCTTCG CGGCAGCTCT CTGCGAGGAG 96 0 

CGGCCGGGCA GCTTCACGCC CACTGAGCCA CAGACCCAGC TAGATTCTGA GGGTCCAACT1 02 0 

10CTGCCAGAGC CGATGGCAGA GGCCCAGTCA CAGATGGATC CTGTGGCCCA GCCTCAGGTG1 08 0 

AACCCCACAC TCCAGCCACG ATCGGATCCC ACAGCTCAGC CACAGCTGAA CCCTACGGCC114 0 

CAGCCACAGT CGGATCCCAC AGCCCAGCCA CAGCTGAACC TCATGGCCCA GCCACAGTCA1 2 00 

GATTCTGTGG CCCAGCCACA GGCAGACACT AACGTCCAGA CCCCTGCACC TGCTGCCAGT1260 

TCTGTGCCCA GTCCCTGTGA TGAAGCTTCC CCAACCCCAT CCTCGCATCC TACCCCAGGG1 32 0 

1 5 GCCCTTGAGG ACCCAGCCAC ACCTCCTGCC TCTGAAGGAG AAAGCCCCAG CAGCACCCCG 13 8 0 

CCAGAGGCGG CCCCGGGCGC AGGCCCCACG TGA 1413 

(11) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 8 amino acids 
20 (B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

25 Met Asp Thr Thr Met Glu Ala Asp Leu Gly Ala Thr Gly His Arg Pro 

15 10 15 

Arg Thr Glu Leu Asp Asp Glu Asp Ser Tyr Pro Gin Gly Gly Trp Asp 
20 25 30 

Thr Val Phe Leu Val Ala Leu Leu Leu Leu Gly Leu Pro Ala Asn Gly 
30 35 40 45 

Leu Met Ala Trp Leu Ala Gly Ser Gin Ala Arg His Gly Ala Gly Thr 
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]() 



->u 55 60 

Arg Leu Ala Leu Leu Leu Leu Ser Leu Ala Leu Sei Asp Phe Leu Phe 
6 ^ 7 0 7 5 8 0 

Leu Ala Ala Ala Ala Phe Gin He Leu Glu He Arg His Gly Gly His 
8 5 90 9 5 

Trp Pro Leu Gly Thr Ala Ala Cys Arg Phe Tyr Tyr Phe Leu Trp Gly 
100 10 5 no 

Val Ser Tyr Ser Ser Gly Leu Phe Leu Leu Ala Ala Leu Ser Leu Asp 
115 120 125 

Arg Cys Leu Leu Ala Leu Cys Pro His Trp Tyr Pro Gly His Arg Pro 
130 135 140 

Val Arg Leu Pro Leu Trp Val Cys Ala Gly Val Trp Val Leu Ala Thr 
145 150 155 160 

Leu Phe Ser Val Pro Trp Leu Val Phe Pro Glu Ala Ala Val Trp Trp 
'- S 165 170 17 5 

Tyr Asp Leu Val He Cys Leu Asp Phe Trp Asp Ser Glu Glu Leu Ser 
180 185 190 

Leu Arg Met Leu Glu Val Leu Gly Gly Phe Leu Pro Phe Leu Leu Leu 
195 200 205 

20 Leu Val Cys His Val Leu Thr Gin Ala Thr Arg Thr Cys His Arg Gin 

210 215 220 

Gin Gin Pro Ala Ala Cys Arg Gly Phe Ala Arg Val Ala Arg Thr He 
225 230 235 240 

Leu Ser Ala Tyr Val Val Leu Arg Leu Pro Tyr Gin Leu Ala Gin Leu 
25 245 250 255 

Leu Tyr Leu Ala Phe Leu Trp Asp Val Tyr Ser Gly Tyr Leu Leu Trp 

260 265 270 

Glu Ala Leu Val Tyr Ser Asp Tyr Leu lie Leu Leu Asn Ser Cys Leu 
2 75 280 28 5 

30 Ser Pro Phe Leu Cys Leu Met Ala Ser Ala Asp Leu Arg Thr Leu Leu 

290 295 300 

7-ig Ser Val Leu Ser Ser Phe Ala Ala Ala Leu Cys Glu Glu Arg Pro 

3 0 5 3 10 3 15 3 2 0 

G.y Ser Phe Thr Pro Thr Glu Pro Gin Thr Gin Leu Asp Sei Glu Gly 
35 325 330 335 

Pro Thr Leu Pro Glu Pro Met Ala Glu Ala Gin Ser Gin Met Asp Pro 
3 4 0 3 4 5 350 
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Val Ala Gin Pro Gin Val Asn Pro Thr Leu Gin Pro Arg Ser Asp Pro 
355 360 365 

Thr Ala Gin Pro Gin Leu Asn Pro Thr Ala Gin Pro Gin Ser Asp Pro 
370 375 380 

5 Thr Ala Gin Pro Gin Leu Asn Leu Met Ala Gin Pro Gin Ser Asp Ser 

385 390 395 400 

Val Ala Gin Pro Gin Ala Asp Thr Asn Val Gin Thr Pro Ala Pro Ala 
405 410 415 

Ala Ser Ser Val Pro Ser Pro Cys Asp Glu Ala Ser Pro Thr Pro Ser 
10 420 425 430 

Ser His Pro Thr Pro Gly Ala Leu Glu Asp Pro Ala Thr Pro Pro Ala 
435 440 445 

Ser Glu Gly Glu Ser Pro Ser Ser Thr Pro Pro Glu Ala Ala Pro Gly 
450 455 460 

15 Ala Gly Pro Thr 

465 

(12) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1248 base pairs 
20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

25 ATGTCAGGGA TGGAAAAACT TCAGAATGCT TCCTGGATCT ACCAGCAGAA ACTAGAAGAT 6 0 

CCATTCCAGA AACACCTGAA CAGCACCGAG GAGTATCTGG CCTTCCTCTG CGGACCTCGG 12 0 

CGCAGCCACT TCTTCCTCCC CGTGTCTGTG GTGTATGTGC CAATTTTTGT GGTGGGGGTC 18 0 

ATTGGCAATG TCCTGGTGTG CCTGGTGATT CTGCAGCACC AGGCTATGAA GACGCCCACC 24 0 

AACTACTACC TCTTCAGCCT GGCGGTCTCT GACCTCCTGG TCCTGCTCCT TGGAATGCCC 3 00 

30 CTGGAGGTCT ATGAGATGTG GCGCAACTAC CCTTTCTTGT TCGGGCCCGT GGGCTGCTAC 360 

TTCAAGACGG CCCTCTTTGA GACCGTGTGC TTCGCCTCCA TCCTCAGCAT CACCACCGTC 420 

AGCGTGGAGC GCTACGTGGC CATCCTACAC CCGTTCCGCG CCAAACTGCA GAGCACCCGG 4 80 

CGCCGGGCCC TCAGGATCCT CGGCATCGTC TGGGGCTTCT CCGTGCTCTT CTCCCTGCCC 54 0 
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AACACCAGCA TCCATCGCAT CAAGTTCCAC TACTTCCCCA ATGGGTCCCT GGTCCCAGGT 600 
TCGGCCACC, GTACGGTCAT CAAG CCCATG TGGATCTACA ATTTCATCAT CCACGTCACC 660 
TCCTTCCTAT TCTACCTCCT CCCCA T GAC T GTCATCAGTG TCCTCTACTA CCTCATGGCA 720 
CTCAGACTAA AGAAAGACAA ATCTCTTGAG GCAGATGAAG GGAATGCAAA TATTCAAAGA 780 
CCCTGCAGAA AATCAG TCAA CAAG ATG CTG TTTGTCTTGG TCTTAGTGTT TGCTATCTGT 3,0 
TOGGCCCCGT TCCACATTGA CCGAC TCTT C TTCAG CTTTG TGGAGGAGTG GAGTGAATCC 9 00 
CTGGCTGCTG TGTTCAACCT CGTCCATGTG GTGTCAGGTG TCTTCTTCTA CCTGAGCTCA 960 
CCTCTCAACC CCATTATCTA TAACCTACTG TCTCGCCGCT TC C AGG CAG C ATTCCAGAAT1 02 0 
GTGATCTCTT CTTTCCACAA ACAGTGGCAC TCCCAGCATG ACCCACAGTT 
■OCAGCGGAACA TCTTCCTGAC AGAATGCCAC T T TG TGGAG C TGACCGAAGA TATAGGTCCC114 0 
C- T CCCA T GTCAGTCATC CATGCACAAC TCTCACCTCC CAACAGCCC T CTCTAGTGAA1 2 0 0 
CAGATGTCAA GAACAAACTA TCAAAGCTTC CACTTTAACA AAACCTGA 
(13) INFORMATION FOR SEQ ID NO: 12: 



124! 



15 



(1) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 415 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 



20 



(X1 ' SEQUENCE DESCRIPTION: SEQ ID N0 :12: 

r S « 01y „ et „ u Lys L6u Gln am Ma t ^ ^ ^ ^ 

10 15 

Lvs Leu Glu Asp Pro Phe Gin Lys Hi ^ Lgu _ n 0 _ 

20 ; r U Asn SGr Thr Glu Glu Tyr 

" 5 30 

■ ueu Ala Phe Lou fvc m t-» 

G1> ^ 9 Phe PhG Pro Val 

4 S 

-1 v., ryr Val ,„ „. „ he Val „ y v „ ^ ^ 

60 

» « u val cys Leu v * 1 Leu °'" ai ° «• «« -v. T hr P „ Thr 

7 5 

teo Tyr Tyr Lou ,r - r Leu «• vo1 - - v., 1°„ 



BNsrxx;in • wo o,) ir .>SHA - i 



WO 00/31258 



- 16- 



PCT/US99/23687 



Leu Gly Met Pro 
100 

Leu Phe Gly Pro 
115 

5 Val Cys Phe Ala 

130 

Tyr Val Ala lie 
145 

Arg Arg Ala Leu 

10 

Phe Ser Leu Pro 
180 

Pro Asn Gly Ser 
195 

15 Pro Met Trp lie 

210 

Tyr Leu Leu Pro 
225 

Leu Arg Leu Lys 

20 

Asn lie Gin Arg 
260 

Leu Val Leu Val 
275 

25 Leu Phe Phe Ser 

290 

Phe Asn Leu Val 
305 

Ala Val Asn Pro 

30 

Ala Phe Gin Asn 
340 

His Asp Pro Gin 
355 

35 Cys His Phe Val 

370 

Gin Ser Ser Met 
385 



Leu Glu Val Tyr Glu Met 
105 

Val Gly Cys Tyr Phe Lys 
120 

Ser lie Leu Ser lie Thr 
135 

Leu His Pro Phe Arg Ala 
150 

Arg lie Leu Gly lie Val 
165 170 

Asn Thr Ser lie His Gly 
185 

Leu Val Pro Gly Ser Ala 
200 

Tyr Asn Phe lie lie Gin 
215 

Met Thr Val lie Ser Val 
230 

Lys Asp Lys Ser Leu Glu 
245 250 

Pro Cys Arg Lys Ser Val 
265 

Phe Ala lie Cys Trp Ala 
280 

Phe Val Glu Glu Trp Ser 
295 

His Val Val Ser Gly Val 
310 

He He Tyr Asn Leu Leu 
325 330 

Val He Ser Ser Phe His 
345 

Leu Pro Pro Ala Gin Arg 
360 

Glu Leu Thr Glu Asp He 

375 

His Asn Ser His Leu Pro 
390 



Trp Arg Asn Tyr Pro Phe 
110 

Thr Ala Leu Phe Glu Thr 
125 

Thr Val Ser Val Glu Arg 
140 

Lys Leu Gin Ser Thr Arg 
155 160 

Trp Gly Phe Ser Val Leu 
175 

He Lys Phe His Tyr Phe 
190 

Thr Cys Thr Val He Lys 
205 

Val Thr Ser Phe Leu Phe 
220 

Leu Tyr Tyr Leu Met Ala 
235 240 

Ala Asp Glu Gly Asn Ala 
255 

Asn Lys Met Leu Phe Val 
270 

Pro Phe His He Asp Arg 

285 

Glu Ser Leu Ala Ala Val 
300 

Phe Phe Tyr Leu Ser Ser 
315 320 

Ser Arg Arg Phe Gin Ala 
335 

Lys Gin Trp His Ser Gin 
350 

Asn He Phe Leu Thr Glu 
365 

Gly Pro Gin Phe Pro Cys 
380 

Thr Ala Leu Ser Ser Glu 
395 400 
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om Mot ser Ar g T , Asn Tyr Gln Ser Phe Hio phe Asn Lys Thr 

410 415 

U4) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS • 
5 (A) LENGTH: n 73 base pairs 

(B) TYPE: nucleic acid 

(C) 5TRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ID MOLECULE TYPE: DNA (genomxc) 

10 (xi) SEQUENCE DESCRIPTION : SEQ ID NO:l 3: 

ATGCCAGATA CT AATAG CAC AATCAATTTA TCACTAAGCA CTCGTGTTAC TTTAGCATTT 60 
TTTATGTCCT TAGTAGCTTT TGCTATAATG CTAGGAAATG CTTTGGTCAT TTTAGCTTTT 1 2 0 
GTGGTGGACA AAAACCTTAG ACATCGAAGT AGTTATTTTT TTCTTAACTT GGCCATCTCT 180 
GACTTCTTTG TGGGTGTGAT CTCCATTCCT TTGTACATC C CTCACACGCT G TTCGAATGG 2 40 
GATTTTGGAA AGGAAATCTG TGTATTTTGG CTCACTACTG ACTATCTGTT ATGTACAGCA 300 
TCTGTATATA ACATTGTCCT CATCAGCTAT GATCGATACC TGTCAGTCTC AAATG CTGTG 3 60 
TCTTATAGAA CTCAACATAC TGGGGTCTTG AAGATTGTTA CTCTGATGGT GGCCGTTTGG 4 20 
GTGCTGGCCT TCTTAGTGAA TGGGCCAATG ATTCTAGTTT CAGAGTCTTG GAAGGATGAA 430 
GGTAGTGAAT GTGAACCTGG ATTTTTTTCG GAATGGTACA TCCTTGCCAT CACATCATT C 540 
20 TTGGAATTCG TGATCCCAGT CATCTTAGTC GCTTATTTCA ACATGAATAT TTATTGGAGC 600 
CTGTGGAAGC GTGATCATCT CAGTAGGTGC CAAAGCCATC CTGGACTGAC TGCTGTCTCT 660 
TCCAACATCT GTGGACACTC ATTCAGAGGT AGACTATCTT CAAGGAGATC T CTTTCTG CA 720 
TCGACAGAAG TTCCTGCATC CTTTCATTCA GAGAGACAGA GGAGAAAGAG TAGTCTCATG 780 
TTTTCCTCAA GAACCAAGAT GAATAGCAAT ACAATTGCTT CCAAAATGGG TTCCTTCTCC 84 0 
-5 CAATCAGATT CTGTAGCTCT TCACCAAAGG GAACATGTTG AACTGCTTAG AGCCAGGAGA 9 00 
TTAGCCAAGT CACTGGCCAT TCTCTTAGGG GTTTTTGCTG TTTGCTGGGC TCCATATTCT 960 
CTGTTCACAA TTGTCCTTTC ATTTTATTCC TCAGCAACAG GTCCTAAATC AGTTTGGTAT1 0^0 
AGAATTG CAT TTTGGCTTCA GTGGTTCAAT TCCTTTGTCA ATCCTCTTTT GTATCCATTG 10 8 0 
TGTCACAAGC GCTTTCAAAA GGCTTTCTTG AAAATATTTT G TATAAAAAA GCAACCTCTA1 14 0 
30 CCATCACAAC ACAGTCGGTC AGTATCTTCT TAA 



1173 
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(15) INFORMATION FOR SEQ ID NO:14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 90 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION; SEQ ID NO:14: 

Met Pro Asp Thr Asn Ser Thr lie Asn Leu Ser Leu Ser Thr Arq Val 
10 1 5 io 15 

Thr Leu Ala Phe Phe Met Ser Leu Val Ala Phe Ala He Met Leu Gly 
20 25 30 

Asn Ala Leu Val He Leu Ala Phe Val Val Asp Lys Asn Leu Arg His 
35 40 45 

15 Arg Ser Ser Tyr Phe Phe Leu Asn Leu Ala He Ser Asp Phe Phe Val 

50 55 60 

Gly Val He Ser He Pro Leu Tyr He Pro His Thr Leu Phe Glu Trp 
65 70 75 80 

Asp Phe Gly Lys Glu He Cys Val Phe Trp Leu Thr Thr Asp Tyr Leu 
20 85 90 95 

Leu Cys Thr Ala Ser Val Tyr Asn lie Val Leu He Ser Tyr Asp Arg 
100 105 no 

Tyr Leu Ser Val Ser Asn Ala Val Ser Tyr Arg Thr Gin His Thr Gly 
US 120 125 

25 Val Leu Lys He Val Thr Leu Met Val Ala Val Trp Val Leu Ala Phe 

130 135 140 

Leu Val Asn Gly Pro Met He Leu Val Ser Glu Ser Trp Lys Asp Glu 
145 150 155 160 

Gly Ser Glu Cys Glu Pro Gly Phe Phe Ser Glu Trp Tyr He Leu Ala 
30 165 170 175 

He Thr Ser Phe Leu Glu Phe Val He Pro Val He Leu Val Ala Tyr 
180 185 190 

Phe Asn Met Asn He Tyr Trp Ser Leu Trp Lys Arg Asp His Leu Ser 
195 200 205 



35 



Arg Cys Gin Ser His Pro Gly Leu Thr Ala Val Ser Ser Asn He Cys 
210 215 220 
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Gly H iS Ser Phe Ar9 ciy Arg Leu Ser Se, Arg Arg Ser Leu Ser ^ 



230 2 

ZJb 240 



Ser Thr Glu Val Pr 



'ro Ala Ser Phe His Ser Glu Ar 



>45 



250 



9 Gin Arg Arg Lys 



255 



Ser ser Leu Met Phe Ser Ser Arg Thr Lys Met Asn Ser ^ ^ ^ 

265 27Q 

Ala Ser Lys Met Gly Ser Phe Scr Gln s _ ^ ^ ^ ^ ^ ^ 

280 285 



10 



\5 



20 



™» «a Olu HI. v.l Slu j. Leu sr9 „. Alg Ai . 9 Ma 

^ yb 300 

S til ^ Val Cys Trp Ala Pro Tyr Ser 

U 3 1 S 

31b 320 

" U ™ L - S « Phe ^ ser Ser Ma Thr Gly Pro Lys 

^ 3 1 n 

J3U 335 

Ser val Trp Tyr Arg Ile Ala phe Trp ^ ^ ^ ^ 

34 5 

J4b 350 

Val A sn Pro Leu Leu Tyr Pro Leu Cys Hrs Lys Arg Phe Gln Lys ^ 

360 365 

Phe Leu Lys Ile Phe C ys He Lys Lys Gln Pro Leu Pro 



375 380 



Ser Arg Ser Val Ser Ser 
385 390 

(16) INFORMATION FOR SEQ ID NO : 1 



5 : 



(i) SEQUENCE CHARACTERISTICS - 

{A) LENGTH: 1128 base parrs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 



0 



(XI) SEQUENCE DESCRIPTION: SEQ ID NO : 1 5 : 

ATGGCGAACC CGAGCGAGCC GGGTGGCAGC GGCGGCGGCG AGGCGGCCGC CCTGGGCCTC 60 

AAGCTGGCCA CGCTCAGCCT GCTCCTGTGC GTGAGCCTAG CGGGCAACGT GCTGTTCGCG 120 

CTGCTGATCG TGCGGGAGCG CAGCCTGCAC CGCGCCCCGT ACTACCTGCT GCTCGACCTG 180 

TGCCTGGCCG ACGGGCTGCG CGCGCTCGCC TGCCTCCCGG CCGTCATGCT GGCGGCGCGG ,4 0 

5 CGTGCGGCGG CCGCGGCGGG GGCGCCGCCG GGCGCGCTGG GCTGCAAGCT GCTCGCCTTC 3 00 
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CTGGCCGCGC TCTTCTGCTT CCACGCCGCC TTCCTGCTGC TGGGCGTGGG CGTCACCCGC 3 60 
TACCTGGCCA TCGCGCACCA CCGCTTCTAT GCAGAGCGCC TGGCCGGCTG GCCGTGCGCC 42 0 
GCCATGCTGG TGTGCGCCGC CTGGGCGCTG GCGCTGGCCG CGGCCTTCCC GCCAGTGCTG 4 80 
GACGGCGGTG GCGACGACGA GGACGCGCCG TGCGCCCTGG AGCAGCGGCC CGACGGCGCC 54 0 
5 CCCGGCGCGC TGGGCTTCCT GCTGCTGCTG GCCGTGGTGG TGGGCGCCAC GCACCTCGTC 6 00 
TACCTCCGCC TGCTCTTCTT CATCCACGAC CGCCGCAAGA TGCGGCCCGC GCGCCTGGTG 660 
CCCGCCGTCA GCCACGACTG GACCTTCCAC GGCCCGGGCG CCACCGGCCA GGCGGCCGCC 72 0 
AACTGGACGG CGGGCTTCGG CCGCGGGCCC ACGCCGCCCG CGCTTGTGGG CATCCGGCCC 780 
GCAGGGCCGG GCCGCGGCGC GCGCCGCCTC CTCGTGCTGG AAGAATTCAA GACGGAGAAG 84 0 

1 0 AGGCTGTGCA AGATGTTCTA CGCCGTCACG CTGCTCTTCC TGCTCCTCTG GGGGCCCTAC 90 0 
GTCGTGGCCA GCTACCTGCG GGTCCTGGTG CGGCCCGGCG CCGTCCCCCA GGCCTACCTG 96 0 
ACGGCCTCCG TGTGGCTGAC CTTCGCGCAG GCCGGCATCA ACCCCGTCGT GTGCTTCCTC 102 0 
TTCAACAGGG AGCTGAGGGA CTGCTTCAGG GCCCAGTTCC CCTGCTGCCA GAGCCCCCGG108 0 
ACCACCCAGG CGACCCATCC CTGCGACCTG AAAGGCATTG GTTTATGA H2 8 

15 (17) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 75 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

20 (D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Met Ala Asn Ala Ser Glu Pro Gly Gly Ser Gly Gly Gly Glu Ala Ala 
1 5 10 15 

25 Ala Leu Gly Leu Lys Leu Ala Thr Leu Ser Leu Leu Leu Cys Val Ser 

20 25 30 

Leu Ala Gly Asn Val Leu Phe Ala Leu Leu He Val Arg Glu Arg Ser 
35 40 45 

Leu His Arg Ala Pro Tyr Tyr Leu Leu Leu Asp Leu Cys Leu Ala Asp 
30 50 55 60 

Gly Leu Arg Ala Leu Ala Cys Leu Pro Ala Val Met Leu Ala Ala Arg 
65 70 75 80 
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Arg Ala Ala Ala Ala Ala Gly Ala Pro Pro Gly Ala Leu Gly Cvs Ly~ 
85 90 95 



Lou Leu Ala Phe Leu Ala Ala Leu Phe Cys Pile His Ala Ala Phe 



100 



105 



Leu 



110 



Leu Leu Gly Val Gly Val Thr Arg Tyr Leu Ala He Ala His Hi, Arg 
115 120 125 

Phe Tyr Ala Glu Arg Leu Ala Gly Trp Pro Cys Ala Ala Met Leu Val 

±3b 140 

Cys Ala Ala Trp Ala Leu Ala Leu Ala Ala Ala Phe Pro Pro Val Leu 

160 

Asp Gly Gly Gly Asp Asp Glu Asp Ala Pro Cys Ala Leu Glu Gin Arg 



170 



175 



Pro Asp Gly Ala Pro Gly Ala Leu Gly Phe Leu Leu Leu Leu Ala Val 

180 185 lgo 

Val Val Gly Ala Thr Mrs Leu Val Tyr Leu Arg Leu Leu Phe Phe He 

195 200 205 



a Arg Leu Val Pro Ala Val Ser 
220 



His Asp Arg Arg Lys Met Arg Pro Al 
210 215 

His Asp Trp Thr Phe His Gly Pro Gly Ala Thr Gly Gin Ala Ala Ala 

^ 5 230 235 

Asn Trp Thr Ala Gly Phe Gly Arg Gly Pro Thr Pro Pro Ala Leu Val 
245 250 255 

Gly lie Arg Pro Ala Gly Pro Gly Arg Gly Ala Arg Arg Leu Leu Val 
260 265 27Q 

Leu Glu Glu Phe Lys Thr Glu Lys Arg Leu Cys Lys Met Phe Tyr Ala 
275 260 285 

Val Thr Leu Leu Phe Leu Leu Leu Trp Gly Pro Tyr Val Val Ala Ser 

295 300 

Tyr Leu Arg Val Leu Val Arg Pro Gly Ala Val Pro Gin Ala Tyr Leu 

315 

Thr Ala Ser Val Trp Leu Thr Phe Ala Gin Ala Gly He Asn Pro Val 

325 330 335 

Val Cys Phe Leu Phe Asn Arg Glu Leu Arg Asp Cys Phe Arg Ala Gin 
340 345 350 

P:-.e Pro Cys Cys Gin Ser Pro Arg Thr Thr Gin Ala Thr His Pro Cy<- 
355 360 365 

A-p Leu Lys Gly He Gly Leu 
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370 375 

(18) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1002 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
1 0 ATGAACACCA CAGTGATGCA AGGCTTCAAC AG ATCTGAG C GGTGCCCCAG AGACACTCGG 6 0 
ATAGTAC AG C TGGTATTCCC AGCCCTCTAC ACAGTGGTTT TCTTGACCGG CATCCTG CTG 12 0 
AATACTTTGG CTCTGTGGGT GTTTGTTCAC ATCCCCAGCT CCTCCACCTT CATCATCTAC 18 0 
CTCAAAAACA CTTTGGTGGC CGACTTGATA ATGACACTCA TGCTTCCTTT CAAAATCCTC 24 0 
TCTGACTCAC ACCTGGCACC CTGGCAGCTC AG AG CTTTTG TGTGTCGTTT TTCTTCGGTG 3 00 

1 5 ATATTTTATG AG AC C ATGT A TGTGGGCATC GTGCTGTTAG GGCTCATAGC CTTTGACAGA 36 0 
TTCCTCAAGA TCATCAGACC TTTGAGAAAT ATTTTTCTAA AAAAACCTGT TTTTGCAAAA 42 0 
ACGGTCTCAA TCTTCATCTG GTTCTTTTTG TTCTTCATCT CCCTGCCAAA TACGATCTTG 480 
AG CAAC AAGG AAGCAACACC ATCGTC TGTG AAAAAGTGTG CTTCCTTAAA GGGGCCTCTG 54 0 
GGGCTGAAAT GGCATCAAAT GGTAAATAAC ATATGCCAGT TTATTTTCTG GACTGTTTTT 600 

20 ATCCTAATGC TTGTGTTTTA TGTGGTTATT GCAAAAAAAG TATATGATTC TTATAGAAAG 660 
TCCAAAAGTA AGGACAGAAA AAACAACAAA AAGCTGGAAG GCAAAGTATT TGTTGTCGTG 72 0 
GCTGTCTTCT TTGTGTGTTT TGCTCCATTT CATTTTGCCA GAGTTCCATA TACT C AC AG T 78 0 
CAAACCAACA ATAAGACTGA CTGTAGACTG CAAAATCAAC TGTTTATTGC TAAAGAAACA 84 0 
ACTCTCTTTT TGGCAGCAAC TAACATTTGT ATGGATCCCT TAATATACAT ATTCTTATGT 900 

25 AAAAAATTCA CAGAAAAGCT ACCATGTATG CAAGGGAGAA AGACCACAGC ATCAAGCCAA 960 

GAAAATCATA GCAGTCAGAC AGACAACATA ACCTTAGGCT GA 1002 

(19) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 333 amino acids 
30 (B) TYPE: amino acid 

( C ) STRANDEDNESS : 
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20 



30 
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(D) TOPOLOGY: not relevant 
(n) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 8 : 

Met Asn Thr Thr Val Met Gin Gly Phe Asn Arg Ser Glu Arg Cys Pro 
15 10 15 

Arg Asp Thr Arg lie Val Gin Leu Val Phe Pro Ala Leu Tyr Thr Val 



20 



30 



Val Phe Leu Thr Gly He Leu Leu Asn Thr Leu Ala Leu Trp Val Phe 
35 40 45 

Val His He Pro Ser Ser Ser Thr Phe He He Tyr Leu Lys Asn Thr 
SO 55 60 

Leu Val Ala Asp Leu He Met Thr Leu Met Leu Pro Phe Lys He Leu 
65 70 75 80 

Ser Asp Ser His Leu Ala Pro Trp Gin Leu Arg Ala Phe Val Cys Arg 
85 90 95 

Phe Ser Ser Val He Phe Tyr Glu Thr Met Tyr Val Gly He Val Leu 
100 105 no 

Leu Gly Leu He Ala Phe Asp Arg Phe Leu Lys He He Arg Pro Leu 
115 120 125 

Arg Asn He Phe Leu Lys Lys Pro Val Phe Ala Lys Thr Val Ser He 
130 135 140 

Phe lie Trp Phe Phe Leu Phe Phe He Ser Leu Pro Asn Thr He Leu 
145 150 155 160 

Ser Asn Lys Glu Ala Thr Pro Ser Ser Val Lys Lys Cys Ala Ser Leu 
- 5 165 170 175 

Lys Gly Pro Leu Gly Leu Lys Trp His Gin Met Val Asn Asn He Cys 
180 185 190 

Gin Phe He Phe Trp Thr Val Phe lie Leu Met Leu Val Phe Tyr Val 
I 95 200 205 



Val He Ala Lys Lys Val Tyr Asp Ser Tyr Arg Lys Ser Lys Ser Lvs 
210 215 220 

Asp Arg Lys Asn Asn Lys Lys Leu Glu Gly Lys Val Phe Val Val Val 

22 ^ 230 235 24 0 

Ala Val Phe Phe Val Cys Phe Ala Pro Phe His Phe Ala Arg Val Pro 

250 255 



2 4 5 
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Tyr Thr His Ser Gin Thr Asn Asn Lys Thr Asp Cys Arg Leu Gin Asn 
260 265 270 

Gin Leu Phe He Ala Lys Glu Thr Thr Leu Phe Leu Ala Ala Thr Asn 
27 5 280 285 

5 He Cys Met Asp Pro Leu He Tyr He Phe Leu Cys Lys Lys Phe Thr 

290 295 300 

Glu Lys Leu Pro Cys Met Gin Gly Arg Lys Thr Thr Ala Ser Ser Gin 
305 310 315 320 

Glu Asn His Ser Ser Gin Thr Asp Asn He Thr Leu Gly 
10 325 330 

(20) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1122 base pairs 

(B) TYPE: nucleic acid 
15 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

ATGGC CAAC A CTAC CGGAG A GCCTGAGGAG GTGAGCGGCG CTCTGTCCCC ACCGTCCGCA 60 

20 TCAGCTTATG TGAAGCTGGT ACTGCTGGGA CTGATTATGT GCGTGAGCCT GGCGGGTAAC 12 0 

GCCATCTTGT CCCTGCTGGT GCTCAAGGAG CGTGCCCTGC ACAAGGCTCC TTACTACTTC 18 0 

CTGCTGGACC TGTGCCTGGC CGATGGCATA CGCTCTGCCG TCTGCTTCCC CTTTGTG CTG 24 0 

GCTTCTGTGC GCCACGGCTC TTCATGGACC TTCAGTGCAC TCAGCTGCAA GATTGTGGCC 30 0 

TTTATGGCCG TGCTCTTTTG CTTCCATGCG GCCTTCATGC TGTTCTGCAT CAGCGTCACC 36 0 

25 CGCTACATGG CCATCGCCCA CCACCGCTTC TACGCCAAGC G C ATG AC ACT CTGGACATGC 42 0 

GCGGCTGTCA TCTGCATGGC CTGGACCCTG TCTGTGGCCA TGGCCTTCCC ACCTGTCTTT 480 

GACGTGGGCA CCTACAAGTT TATTCGGGAG GAGGACCAGT GCATCTTTGA GCATCGCTAC 54 0 

TTCAAGGCCA ATGACACGCT GGGCTTCATG CTTATGTTGG CTGTGCTCAT GGCAGCTACC 600 

CATGCTGTCT ACGGCAAGCT GCTCCTCTTC GAGTATCGTC ACCGCAAGAT GAAGCCAGTG 660 

30 CAGATGGTGC C AG C CAT C AG CCAGAACTGG ACATTCCATG GTCCCGGGGC CACCGGCCAG 72 0 

GCTGCTGCCA ACTGGATCGC CGGCTTTGGC CGTGGGCCCA TGCCACCAAC CCTG CTGGGT 780 

ATCCGGCAGA ATGGGCATGC AGCCAGCCGG CGGCTACTGG GCATGGACGA GGTCAAGGGT 84 0 
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GAAAAGCAGC TGGGCCGCAT GTTCTACGCG ^ TCTTTCTGCT CCTCTGGTCA M „ 
CCCTACATCG .GOGG^A CTGGCGAGTG TTTGTGAAAG CCTGTGCTGT GCGGCACCGC S60 



TAGCTGGCCA CTGCTGTTTG GATGAGCTTC GCCCAGGCTG GCGTCAACCC AAX.GTCTGCXO.O 
TTCCTGCTCA ACAAGGAGCT CAAGAAGTGC CTGACCACTC ACGCCCCC.G CTGGGGCACA1 080 
5 GGAGGTGCCC CGGCTCCCAG AGAACCCTAC TGTGTCATGT GA 

1122 

(21) INFORMATION FOR SEQ ID NO: 20: 

(l) SEQUENCE CHARACTERISTICS - 

(A) LENGTH: 373 amino acids 

(B) TYPE: amino acid 
10 (C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(iD MOLECULE TYPE: DNA (genomic) 

(XI) SEQUENCE DESCRIPTION : SEQ ID NO: 20: 
^ Met Ala Asn Thr Thr Gly Glu prQ ^ ^ ^ ^ ^ ^ ^ 

io 15 

Pro Ser Ala Ser Ala T yr VaX L ys Leu Val Leu Leu oly Leu He 

Met c ys val Ser Leu Ala Gl y A Sn Ala He Leu Ser Leu Leu Val Leu 

0 45 
" U„ Gl» „. „ is AU p „ Tyr ph= ^ 

" 60 
CVS Leu Ala Asp Gly ^ Arg Ser ^ ^ ^ ^ ^ ^ ^ ^ 

7 5 

25 Ma Ser val Ar g „ i8 Gly ser ser Trp ^ ^ Ma "J 

90 

U 95 

^ He Val Ala Phe Met Ala Val Leu P h e C ys Phe „ is Ala Ala Phe 

1 0 s 

U ^ 110 



35 



Met Leu Phe Cys lie q^r- 

115 * ^ Val lhr Ar 9 ^ ^t Ala He Ala His Hi 

120 

12 5 

Arg Phe Tyr Ala Lys Aro M^t- rh>- 7 

130 9 ^ ' hl IjGU Tr P T ^r Cys Ala Ala Val II 

Jo 14 0 

Cys Met Ala Trp Thr Leu Ser Val Ala Met Ala Phe Pro Pro Val Ph 



150 — ^ « ie 

Iib 160 



165 ' iJC rtlg GJu Asp Gin Cys He Phe 



170 175 
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15 



Glu His Arg Tyr Phe Lys Ala Asn Asp Thr Leu Gly Phe Met Leu Met 
180 185 19Q 

Leu Ala Val Leu Met Ala Ala Thr His Ala Val Tyr Gly Lys Leu Leu 
195 200 205 

Leu Phe Glu Tyr Arg His Arg Lys Met Lys Pro Val Gin Met Val Pro 
210 215 220 

Ala He Ser Gin Asn Trp Thr Phe His Gly Pro Gly Ala Thr Gly Gin 
225 23 0 235 2 40 

Ala Ala Ala Asn Trp He Ala Gly Phe Gly Arg Gly Pro Met Pro Pro 
245 250 255 

Thr Leu Leu Gly He Arg Gin Asn Gly His Ala Ala Ser Arg Arg Leu 
260 265 270 

Leu Gly Met Asp Glu Val Lys Gly Glu Lys Gin Leu Gly Arg Met Phe 
275 280 285 

Tyr Ala He Thr Leu Leu Phe Leu Leu Leu Trp Ser Pro Tyr He Val 
290 295 300 

Ala Cys Tyr Trp Arg Val Phe Val Lys Ala Cys Ala Val Pro His Arg 
305 310 315 3 20 

Tyr Leu Ala Thr Ala Val Trp Met Ser Phe Ala Gin Ala Ala Val Asn 
325 330 335 

Pro He Val Cys Phe Leu Leu Asn Lys Asp Leu Lys Lys Cys Leu Thr 
340 345 350 

Thr His Ala Pro Cys Trp Gly Thr Gly Gly Ala Pro Ala Pro Arg Glu 
355 360 3 6 5 

25 Pro Tyr Cys Val Met 

370 

(22) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1053 base pairs 
30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
35 ATGGCTTTGG AACAGAACCA GT CAAC AG AT TATTATTATG AGGAAAATGA AATGAATGGC 6 0 
ACTTATGACT ACAGTCAATA TGAATTGATC TGTATCAAAG AAGATGTCAG AG AATTTG C A 12 0 



20 
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AAAGTTTTCC TCCCTGTATT CCTCACAATA GCTTTCGTCA TTGGACTTGC AGGCAATTCC 180 

ATGGTAGTGG CAATTTATGC CTATTACAAG AAACAGAGAA CCAAAACAGA TGTGTACATC 24 0 

CTGAATTTGG CTGTAGCAGA TTTACTCCTT CTATTCACTC TGCCTTTTTG GGCTGTTAAT 3 00 

GCAGTTCATG GGTGGGTTTT AGGGAAAATA ATGTGCAAAA TAACTTCAGC CTTGTACACA 36 0 

5 CTAAACTTTG TCTCTGGAAT GCAGTTTCTG GCTTGCATCA GCATAGACAG ATATGTGGCA 420 

GTAACTAATG TCCCCAGGCA ATCAGGAGTG G G AAAAC CAT GCTGGATCAT CTGTTTCTGT 480 

GTCTGGATGG CTGCCATCTT GCTGAGCATA CCCCAGCTGG TTTTTTATAC AGTAAATGAC 54 0 

AATGCTAGGT GCATTCCCAT TTTCCCCCGC TACCTAGGAA CATCAATGAA AGCATTGATT 6 00 

CAAATGCTAG AG AT CTG CAT TGGATTTGTA GTACCCTTTC TTATTATGGG GGTGTGCTAC 66 0 

10TTTATCACGG CAAGGACACT CATGAAGATG CCAAACATTA AAATATCTCG ACCCCTAAAA 72 0 

GTTCTGCTCA CAGTCGTTAT AGTTTTCATT GTCACTCAAC TGCCTTATAA CATTGTCAAG 780 

TTCTGCCGAG CC AT AG AC AT CATCTACTCC CTGATCACCA GCTGCAACAT GAGCAAACGC 84 0 

ATGGACATCG CCATCCAAGT CACAGAAAGC ATTGCACTCT TTCACAGCTG CCTCAACCCA 90 0 

ATCCTTTATG TTTTTATGGG AG CAT CTTTC AAAAACTACG TTATGAAAGT GGCCAAGAAA 96 0 

1 5 TATGGGTCCT GGAGAAGACA GAGACAAAGT GTGGAGGAGT TTCCTTTTGA TTCTGAGGGT1 02 0 

CCTACAGAGC CAACCAGTAC T TTT AG CAT T TAA 1053 

(23) INFORMATION FOR SEQ ID NO : 22 : 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 350 amino acids 
20 { B ) TYPE: amino acid 

( C ) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(:■::.) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

25 Met Ala Leu Glu Gin Asn Gin Ser Thr Asp Tyr Tyr Tyr Glu Glu Asn 

15 10 15 

Glu Met Asn Gly Thr Tyr Asp Tyr Ser Gin Tyr Glu Leu lie Cys lie 
20 25 30 

Lys Glu Asp Val Arg Glu Phe Ala Lys Val Phe Leu Pro Val Phe Leu 
50 3 5 4 0 4 5 

Thr lie Ala Phe Val lie Gly Leu Ala Gly Asn Ser Met Val Val Ala 
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50 55 60 

He Tyr Ala Tyr Tyr Lys Lys Gin Arg Thr Lys Thr Asp Val Tyr He 
65 70 75 so 

Leu Asn Leu Ala Val Ala Asp Leu Leu Leu Leu Phe Thr Leu Pro Phe 
85 90 95 

Trp Ala Val Asn Ala Val His Gly Trp Val Leu Gly Lys He Met Cys 
100 105 110 

Lys He Thr Ser Ala Leu Tyr Thr Leu Asn Phe Val Ser Gly Met Gin 
115 120 125 

Phe Leu Ala Cys He Ser He Asp Arg Tyr Val Ala Val Thr Asn Val 
130 135 140 

Pro Ser Gin Ser Gly Val Gly Lys Pro Cys Trp He He Cys Phe Cys 
145 ±5° 155 160 

Val Trp Met Ala Ala He Leu Leu Ser He Pro Gin Leu Val Phe Tyr 
15 165 ivo 175 

Thr Val Asn Asp Asn Ala Arg Cys He Pro He Phe Pro Arg Tyr Leu 
180 185 190 

Gly Thr Ser Met Lys Ala Leu He Gin Met Leu Glu He Cys He Gly 
195 200 205 

20 Phe Val Val Pro Phe Leu He Met Gly Val Cys Tyr Phe He Thr Ala 

210 215 220 

Arg Thr Leu Met Lys Met Pro Asn He Lys He Ser Arg Pro Leu Lys 
225 230 235 240 

Val Leu Leu Thr Val Val He Val Phe He Val Thr Gin Leu Pro Tyr 
25 245 250 255 

Asn He Val Lys Phe Cys Arg Ala He Asp He He Tyr Ser Leu He 
260 265 270 

Thr Ser Cys Asn Met Ser Lys Arg Met Asp He Ala He Gin Val Thr 
27 5 280 285 



30 



35 



Glu Ser He Ala Leu Phe His Ser Cys Leu Asn Pro He Leu Tyr Val 
290 295 300 

Phe Met Gly Ala Ser Phe Lys Asn Tyr Val Met Lys Val Ala Lys Lys 
305 310 315 320 

Tyr Gly Ser Trp Arg Arg Gin Arg Gin Ser Val Glu Glu Phe Pro Phe 
325 330 335 

Asp Ser Glu Gly Pro Thr Glu Pro Thr Ser Thr Phe Ser He 
340 345 350 



5 
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(24) INFORMATION FOR SEQ ID NO : 2 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1116 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE : DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 3 : 
ATGCCAGGAA ACGCCACCCC AGTGACCACC ACTGCCCCGT GGGCCTCCCT GGGCCTCTCC 6 0 
1 0 GCCAAGACCT GCAACAACGT GTCCTTCGAA GAGAGCAGGA TAGTCCTGGT CGTGGTGTAC 1 20 
AGCGCGGTGT GCACGCTGGG GGTGCCGGCC AACTG C CTG A CTGCGTGGCT GGCGCTGCTG ISO 
CAGGTACTGC AGGGCAACGT GCTGGCCGTC TACCTGCTCT GCCTGGCACT CTGCG AACTG 240 
CTGTACACAG GCACGCTGCC ACTCTGGGTC ATCTATATCC GCAACCAGCA CCGCTGGACC 300 
CTAGGCCTGC TGGCCTCGAA GGTGACCGCC TACATCTTCT TCTGCAACAT CTACGTCAGC 360 
ATCCTCTTCC TGTGCTGCAT CTCCTGCGAC CGCTTCGTGG CCGTGGTGTA CGCGCTGGAG 420 
AGTCGGGGCC GCCGCCGCCG GAGGACCGCC ATCCTCATCT CCGCCTGCAT CTTCATCCTC 480 
GTCGGGATCG TTCACTACCC GGTGTTCCAG ACGGAAGACA AGGAGAC CTG CTTTGACATG 54 0 
CTGCAGATGG ACAGCAGGAT TGCCGGGTAC TACTACGCCA GGTTCACCGT TGGCTTTGCC 600 
ATCCCTCTCT CCATCATCGC CTTCACCAAC CACCGGATTT TCAGGAGCAT CAAGCAGAGC 660 
OATGGGCTTAA GCGCTGCCCA GAAGG CCAAG GTGAAGCACT CGGCCATCGC GGTGGTTGTC 720 
ATCTTCCTAG TCTGCTTCGC CCCGTACCAC CTGGTTCTCC TCGTCAAAGC CGCTGCCTTT 780 
TCCTACTACA GAGGAGACAG GAACGCCATG TGCGGCTTGG AGGAAAGGCT GTACACAGCC 8 40 
TCTGTGGTGT TTCTGTGCCT GTCCACGGTG AACGGCGTGG CTGACCCCAT TATCTACGTG ,00 
CTGGCCACGG ACCATTCCCG CCAAGAAGTG TCCAGAATCC ATAAGGGGTG GAAAGAGTGG S60 
5 TCCATGAAGA CAGACGTCAC CAGGCTCACC CACAGCAGGG ACACCGAGGA GCTGCAGTCG1 02 0 
CCCGTGGCCC TTG CAG ACCA CTACACCTTC TCCAGGCCCG TGCACCCACC AGGGTCACCA1 0 8 0 
TGCCCTGCAA AGAGG CTGAT TGAGGAGTCC TGCTGA 

1116 

(25) INFORMATION FOR SEQ ID NO: 24: 

' i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 371 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

5 <xi) SEQUENCE DESCRIPTION ; SEQ ID NO:24: 

Met Pro Gly Asn Ala Thr Pro Val Thr Thr Thr Ala Pro Trp Ala Ser 
1 5 10 15 

Leu Gly Leu Ser Ala Lys Thr Cys Asn Asn Val Ser Phe Glu Glu Ser 
20 25 30 



10 



20 



30 



Arg He Val Leu Val Val Val Tyr Ser Ala Val Cys Thr Leu Gly Val 
35 40 45 



Pro Ala Asn Cys Leu Thr Ala Trp Leu Ala Leu Leu Gin Val Leu Gin 
50 55 60 



Gly Asn Val Leu Ala Val Tyr Leu Leu Cys Leu Ala Leu Cys Glu Leu 

80 



15 65 70 75 



Leu Tyr Thr Gly Thr Leu Pro Leu Trp Val He Tyr He Arg Asn Gin 
85 90 9 5 

His Arg Trp Thr Leu Gly Leu Leu Ala Ser Lys Val Thr Ala Tyr He 
1Q 0 105 no 

Phe Phe Cys Asn He Tyr Val Ser He Leu Phe Leu Cys Cys He Ser 
115 120 125 

Cys Asp Arg Phe Val Ala Val Val Tyr Ala Leu Glu Ser Arg Gly Arg 
130 135 14Q 

Arg Arg Arg Arg Thr Ala He Leu He Ser Ala Cys He Phe He Leu 
25 145 150 155 160 

Val Gly He Val His Tyr Pro Val Phe Gin Thr Glu Asp Lys Glu Thr 
165 170 175 

Cys Phe Asp Met Leu Gin Met Asp Ser Arg He Ala Gly Tyr Tyr Tyr 
180 185 190 



Ala Arg Phe Thr Val Gly Phe Ala He Pro Leu Ser He He Ala Phe 

195 200 205 

Thr Asn His Arg He Phe Arg Ser He Lys Gin Ser Met Gly Leu Ser 

21° 215 220 



Ala Ala Gin Lys Ala Lys Val Lys His Ser Ala He Ala Val Val Val 

240 



35 225 230 235 
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Zr ^ Tyi ' LCU Val ^ Val I, V , 

j> u 2 5 5 

Ala Ala Ala Phe Ser Tvr Twv tw-^ ^-i 

260 9 V P ASn Aia Met C VS Gly 

265 27Q 

Leu Olu Glu Arg Leu Tyr Thr Ala Ser Val Val Phe L eu C yS L , u Ser 

280 285 



I'hr Val Asn Gly Val Al 



290 



a Asp Pro lie II 



295 



e Tyr Val Leu Ala Thr Asp 



300 



His Ser Arg Gin Glu Val 



305 - Al " 9 Ile HiS ^ G1 Y Trp Lys Glu Trp 

310 315 320 



Ser Met Lys Thr Asp Val Thr Arg Leu Thr His Sor Ar , 



325 



330 



g Asp Thr Glu 
335 



ASP H1S T ^ ^ Ser Arg 



345 350 



15 



Pro Val His Pro Pro Gly Ser Pro Cys Pro Al 



360 



20 



25 



355 

Glu Ser Cys 
3 70 



(26) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: in 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ID MOLECULE TYPE: DNA (genomic) 



a Lys Arg Leu lie Glu 



365 



(Xl) SEQUENCE DESCRIPTION : SEQ ID NO : 2 5 : 
ATGGCGAACT ATAGCCATGC AGCTGACAAC ATTTTGCAAA ATCTCTCGCC TCTAACAGCC 6 0 
TTTCTGAAAC TGACTTCCTT GGGTTTCATA ATAGGAGTCA GCGTGGTGGG CAACCTCCTG 120 
ATCTCCATTT TGCTAGTGAA AGATAAGACC TTGCATAGAG CACCTTACTA CTTCCTGTTG ISO 
30 GATCTTTGCT GTTCAGATAT CCTCAGATCT GCAATTTGTT TCCCATTTGT GTTCAACTCT 240 
G T C AAAAA TG GCTCTACCTG GACTTATGGG ACTCTGACTT GCAAAGTGAT TGCCTTTCTG 300 
GGGGTTTTGT CCTGTTTCCA CACTGCTTTC ATGCTCTTCT GCATCAGTGT CACCAGATAC 360 
TTAGCTATCG CCCATCACCG CTTCTATACA AAGAGGCTGA CCTTTTGGAC GTGTCTGGCT 4 20 
GTGATCTGTA TGGTGTGGAC TCTGTCTGTG GCCATGGCAT TTCCCCCGGT TTTAGACGTG 4B0 
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GGCACTTACT CATTCATTAG GGAGGAAGAT CAATGCACCT TCCAACACCG CTCCTTCAGG 54 0 

GCTAATGATT CCTTAGGATT TATGCTGCTT CTTGCTCTCA TCCTCCTAGC CACACAGCTT 6 00 

GTCTACCTCA AGCTGATATT TTTCGTCCAC GATCGAAGAA AAATGAAG CC AGTCCAGTTT 660 

G TAG C AG C AG TCAGCCAGAA CTGGACTTTT CATGGTCCTG GAGCCAGTGG CCAGGCAGCT 72 0 

5 GCCAATTGGC TAGCAGGATT TGGAAGGGGT CCCACACCAC CCACCTTGCT GGGCATCAGG 78 0 

CAAAATGCAA ACACCACAGG CAGAAGAAGG CTATTGGTCT TAGACGAGTT CAAAATGGAG 84 0 

AAAAGAATCA GCAGAATGTT CTATATAATG ACTTTTCTGT TTCTAACCTT GTGGGGCCCC 90 0 

TACCTGGTGG CCTG TTATTG GAGAGTTTTT GCAAGAGGGC CTGTAGTACC AGGGGGATTT 96 0 

CTAACAGCTG CTGT CTGG AT GAGTTTTGCC CAAGCAGGAA TCAATCCTTT TGTCTGCATT1 02 0 

1 0 TTCTCAAACA GGGAGCTGAG GCGCTGTTTC AGCACAACCC TTCTTTACTG CAGAAAATCC 1 0 8 0 

AGGTTACCAA GGGAACCTTA CTGTGTTATA TGA 1113 

(27) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 370 amino acids 
15 (B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 

20 Met Ala Asn Tyr Ser His Ala Ala Asp Asn lie Leu Gin Asn Leu Ser 

15 10 15 

Pro Leu Thr Ala Phe Leu Lys Leu Thr Ser Leu Gly Phe He He Gly 
20 25 30 

Val Ser Val Val Gly Asn Leu Leu He Ser He Leu Leu Val Lys Asp 
25 35 40 45 

Lys Thr Leu His Arg Ala Pro Tyr Tyr Phe Leu Leu Asp Leu Cys Cys 
50 55 60 

Ser Asp He Leu Arg Ser Ala He Cys Phe Pro Phe Val Phe Asn Ser 
65 7 0 75 80 



30 



V«i Lys Asn Gly Ser Thr Trp Thr Tyr Gly Thr Leu Thr Cys Lys Val 
85 90 95 



He Ala Phe Leu Gly Val Leu Ser Cys Phe His Thr Ala Ph 



e Met Leu 
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Phe Cys He Ser Val Thr Ara T w . 

115 Al9 Hi LeU Ala He Ala ,Us Hrs Arg Phe 

Tyr Thr Lys Arg Leu Thr Phe Trn tk, ^ 
5 130 Trp lhl " C M= Lou Ala val lie Cys Met 

140 

V.1 T, Thr L=u Sor „. Ma ^ ^ 

155 

160 

Gly Thr Tyr Ser Phe He Ara ciu n , 

Arg Glu Glu Asp Gin Cys Thr Phe Gin H ls 

170 175 

'<» Arg ser Phe Arg Ala Asn Aon r 

180 ^ P ^ u G1 * P1 - Met Lgu Leu Leu Ala 

185 190 

Leu He Leu Leu Ala Thr m 

^ Val ^ Lou Lys Leu Xl e Phe Phe 

205 

„ v.! „. iSp Arg flrg Lys Lys w ^ ^ ^ ^ 

220 

Se, 01„ ».„ Trp Thr ^ „ is My ^ ^ ^ My Mn ^ ^ 

«• »•» - «. Gly Phe c ly Arg „ y Pto nv pro pro "° 

"- 50 255 
20 Leu Gly H e Arg Gin Asn Ala Asn Thr Thr Glv A a 

26 0 ?.™ Thr G1 y Ar 9 Arg Arg Leu Leu 

265 270 

Val Leu Asp Glu Phe Lys Met Glu Lys Ara He <=« » 

275 y Arg Ile Ser Arg Met Phe Tyr 

280 285 

25 ». £ Thr Phe Leu Phe Thr Leu Trp ^ ^ 

300 

cv= r yr Trp Arg Vil ^ Ma Arg My pro v ^ ^ ^ ^ ^ 

315 320 

Leu Thr Ala Ala Val Trp Met s^r- ph r , 

P Met Ser Phe Ala Gin Ala Gly He Asn Pro 

330 335 
30 Phe val cys He Phe Ser Asn Ara gi,, r , 

34 0 9 U LSU Arg Ar 9 Phe Ser Thr 

345 350 

Thr Leu Leu Tyr Cys Arg Lys Ser Arg Leu Pro Arg Glu Pro Tyr Cys 



360 365 



Val He 
370 



(28) INFORMATION FOR SEQ ID NO : 2 7 : 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1080 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
5 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
ATGCAGGTCC CGAACAGCAC CGGCCCGGAC AACGCGACGC TGCAGATGCT GCGGAACCCG 6 0 
GCGATCGCGG TGGCCCTGCC CGTGGTGTAC TCGCTGGTGG CGGCGGTCAG CATCCCGGGC 12 0 

1 0 AACCTCTTCT CTCTGTGGGT GCTGTGCCGG CGCATGGGGC CCAGATCCCC GTCGGTCATC 18 0 
TTCATGATCA ACCTGAGCGT CACGGACCTG ATGCTGGCCA GCGTGTTGCC TTTCCAAATC 24 0 
TACTAC C ATT GCAACCGCCA CCACTGGGTA TTCGGGGTGC TGCTTTGCAA CGTGGTGACC 30 0 
GTGGCCTTTT ACGCAAACAT GTATTCCAGC ATCCTCACCA TGAC CTGTAT CAGCGTGGAG 36 0 
CGCTTCCTGG GGGTCCTGTA CCCGCTCAGC TCCAAGCGCT GGCGCCGCCG TCGTTACGCG 42 0 

15 GTGGCCGCGT GTG CAGGG AC CTGGCTGCTG CTCCTGACCG CCCTGTGCCC GCTGGCGCGC 48 0 
ACCGATCTCA CCTACCCGGT GCACGCCCTG GGCATCATCA CCTGCTTCGA CGTCCTCAAG 54 0 
TGGACGATGC TCCCCAGCGT GGCCATGTGG GCCGTGTTCC TCTTCACCAT CTTCATCCTG 6 00 
CTGTTCCTCA TCCCGTTCGT GAT C ACCGTG GCTTGTTACA CGGCCACCAT CCTCAAGCTG 66 0 
TTGCGCACGG AGGAGGCGCA CGGCCGGGAG CAGCGGAGGC GCGCGGTGGG CCTGGCCGCG 72 0 

20 GTGGTCTTGC TGGCCTTTGT CACCTGCTTC GCCCCCAACA ACTTCGTGCT CCTGGCGCAC 780 
ATCGTGAGCC GCCTGTTCTA CGGCAAGAGC TACTACCACG TGTACAAGCT CACGCTGTGT 84 0 
CTCAGCTGCC TCAACAACTG TCTGGACCCG TTTGTTTATT ACTTTGCGTC CCGGGAATTC 900 
CAGCTGCGCC TGCGGGAATA TTTGGGCTGC CGCCGGGTGC C C AG AGAC AC CCTGGACACG 96 0 
CGCCGCGAGA GCCTCTTCTC CGCCAGGACC ACGTCCGTGC GCTCCGAGGC CGGTGCGCAC1 02 0 

25 CCTGAAGGGA TGGAGGGAGC CACCAGGCCC GGCCTCCAGA GGCAGGAGAG TGTGTTCTG Al 08 0 

(2 9) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 59 amino acids 

(B) TYPE: amino acid 
30 (C) STRANDEDNESS: 

(D) TOPOLOGY: not relevant 
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<xi) SEQUENCE DESCRIp T10N: SEQ ID NO: 28: 
Met Gin Val Pro Asn Ser Thr rh, n 

J >^ei i hi Gly Pro Asp Asn Ala Thr Leu Gin Met 

io 15 

25 30 

Val Ala Ala Val Ser lie Pro Glv Asn L^ P k g ^ r r T 

35 * " iiG otJ1 ^ eu T ^P Val Leu 

40 45 

cys ,r 9 Arg Met G]y Pro ser pro ser ne ^ ^ 

bb 60 
Leu Ser Val Thr Asp Leu M-t Len Al^ c« , 

65 P ^ LeU Ala Ser Val Leu Pro Phe Gin He 

7 5 

^ 80 
Tyr Tyr Hia Cys As„ Arg „ is His Trp Val phe Qly ^ ^ ^ 

90 95 

Asn Val val Thr Val Ala Phe Tyr Ala Asn Met Tyr Ser Ser T i T 

100 1nc . Y i>el Sci Ile Leu 

XU!:> 110 

Thr Met: Thr Cys Xle Ser Val Clu Arg Phe Leu G}y Val ^ 

Leu ser Ser Lys Ar g Trp Arg Arg Ar g Ar g Tyr Ala Val Ala Ala Cys 

135 140 
Ala Gly Thr Trp L eu L eu L eu Le u Thr Ala Leu Cys Pro Leu Ala 

15 5 

160 

Hi «« Leu Cly xie Ile Thr Cys php 

17 0 

±/U 175 

Asp Val Leu L ys Trp Thr „ et Leu Pro Ser Va! Ala Met Trp Ala Val 

185 igo 

Phe Leu Phe Thr lie Phe Xle Leu Leu Phe Leu xle Pro Phe Val Ile 

Thr Val Ala Cvs Tyr Thr Ala Thv , 

210 ' ? le LeU LyS Leu Leu Arg Thr Glu 

2 2 0 

S Ar9 ^ G1 " *rg Arg Ala Val Gly Leu Ala Ala 

' 235 

2 4 0 

A j! PhG Val Thl ' ^ Ala Pro A.,n Asn Phe Va 3 

2 55 

^ ^ S « - g Lou p he Tyr Gly ^ 
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His Val Tyr Lys Leu Thr Leu Cys Leu Ser Cys Leu Asn Asn Cys Leu 
275 280 285 

Asp Pro Phe Val Tyr Tyr Phe Ala Ser Arg Glu Phe Gin Leu Arg Leu 
290 295 300 

5 Arg Glu Tyr Leu Gly Cys Arg Arg Val Pro Arg Asp Thr Leu Asp Thr 

305 310 315 320 

Arg Arg Glu Ser Leu Phe Ser Ala Arg Thr Thr Ser Val Arg Ser Glu 
325 330 335 

Ala Gly Ala His Pro Glu Gly Met Glu Gly Ala Thr Arg Pro Gly Leu 
10 34 0 345 350 

Gin Arg Gin Glu Ser Val Phe 
355 

(30) INFORMATION FOR SEQ ID NO: 29: 



15 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1503 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

Cii) MOLECULE TYPE: DNA (genomic) 



20 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 

ATGGAGCGTC CCTGGGAGGA CAGCCCAGGC CCGGAGGGGG CAGCTGAGGG CTCGCCTGTG 60 

CCAGTCGCCG CCGGGGCGCG CTCCGGTGCC GCGGCGAGTG GCACAGGCTG GCAGCCATGG 12 0 

GCTGAGTGCC CGGGACCCAA GGGGAGGGGG CAACTGCTGG CGACCGCCGG CCCTTTGCGT 18 0 

CGCTGGCCCG CCCCCTCGCC TGCCAGCTCC AGCCCCGCCC CCGGAGCGGC GTCCGCTCAC 24 0 

25 TCGGTTCAAG GCAGCGCGAC TGCGGGTGGC GCACGACCAG GGCGCAGACC TTGGGGCGCG 300 

CGGCCCATGG AGTCGGGGCT GCTGCGGCCG GCGCCGGTGA GCGAGGTCAT CGTCCTGCAT 36 0 

TACAACTACA CCGGCAAGCT CCGCGGTGCG AGCTACCAGC CGGGTGCCGG CCTGCGCGCC 42 0 

GACGCCGTGG TGTGCCTGGC GGTGTGCGCC TTCATCGTGC TAGAGAATCT AGCCGTGTTG 48 0 

TTGGTGCTCG GACGCCACCC GCGCTTCCAC GCTCCCATGT TCCTGCTCCT GGGCAGCCTC 54 0 

30 ACGTTGTCGG ATCTGCTGGC AGGCGCCGCC TACGCCGCCA ACATCCTACT GTCGGGGCCG 6 00 

CTCACGCTGA AACTGTCCCC CGCGCTCTGG TTCGCACGGG AGGGAGGCGT CTTCGTGGCA 66 0 

CTCACTGCGT CCGTGCTGAG CCTCCTGGCC ATCGCGCTGG AGCGCAGCCT CACCATGGCG 72 0 



WO 00/31258 KT/USW/23687 

-37- 

CGCAGGGGGC CCGCGCCCGT CTCCAGTCGG GGGCGCACGC TGGCGATGGC AGCCGCGGCC 78 0 

TGGGCCGTGT CGCTGCTCCT CGGGCTCCTG CCAGCGCTGG GOTGGAATTG CCTGGGTCGC 84 0 

CTGGACGCTT GCTCCACTGT CTTGCCGCTC TACGGGAAGG CCTACGTGCT CTTCTGCGTG 9 00 

CTCGCCTTCG TGGGCATCCT GGCCGCGATC TGTGCACTCT ACGCGCGCAT CTACTGCCAG 96 0 

5GTACGCGCCA ACGGGCGGCG CCTGCCGGCA GGGCCGGGGA CTGCGGGGAC CACCTCGAGC1 02 0 

CGGGCGCGTC GCAAGCCGCG GTCTCTGGCG TTGCTGCGCA CGCTGAGGGT GGTGCTGGTG 1 0 8 0 

GCCTTTGTGG CATGTTGGGG CCCCCTCTTC CTGCTGCTGT TGCTCGACGT GGGGTGCGGG1 1 4 0 

GCGCGCACCT GTCCTGTACT GCTGGAGGCC GATGCGTTCG TGGGACTGGC CATGGCCAAC12 0 0 

TCACTTCTGA A C G C C AT CAT CTACACGCTC ACCAACCGCG ACCTGCGCCA CGCGCTCCTG 1 2 6 0 

10CGCCTGGTCT GCTGCGGACG CCACTCCTGC GGC AGAGACC CGAGTGGCTC CCAGCAGTCG1 3 2 0 

GCGAGCGCGG CTGAGGCTTC CGGGGGCCTG CGCCGCTGCC TGCCCCCGGG CCTTGATGGG1 3 8 0 

AGCTTCAGCG GCTCGGAGCG CTCATCGCCC CAGCGCGACG GGCTGGACAC CAGCGGCTCC1 4 4 0 

ACAGGCAGCC CCGGTGCACC CACAGCCGCC CGGACTCTGG TATCAGAACC GGCTGCAGAC1 500 
TGA 15Q3 

15 (31) INFORMATION FOR SEQ ID NO : 3 0 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 500 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

-° (D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

) SEQUENCE DESCRIPTION: SEQ ID NO : 3 0 : 

Met Glu Arg Pro Trp Glu Asp Ser Pro G] y Pro Glu Gly Ala Ala Glu 

1 5 10 is 

25 G1 V Ser Pro Val Pro Val Ala Ala Gly Ala Arg Sei Gly Ala Ala Ala 

20 25 ' 30 

Ser GJy Thr G]y Trp Gin Pro Trp Ala Glu Cys Pre Gly Pro Lvs Gly 
3 5 4 0 4 5 

Arg Gly Gin Leu Leu Ala Thr Ala Gly Pro Leu Arg Arg Trp Pro Ala 
30 ^ 55 6 o" 



Pro Ser Pro Ala Ser Ser Ser Pro Ala Pro Gly Ala Ala Ser Ala Hir; 
65 70 75 80 
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Ser Val Gin Gly Ser Ala Thr Ala Gly Gly Ala Arg Pro Gly Arg Arg 
85 90 95 

Pro Trp Gly Ala Arg Pro Met Glu Ser Gly Leu Leu Arg Pro Ala Pro 
100 105 no 

5 Val Ser Glu Val lie Val Leu His Tyr Asn Tyr Thr Gly Lys Leu Arg 

H5 120 125 

Gly Ala Ser Tyr Gin Pro Gly Ala Gly Leu Arg Ala Asp Ala Val Val 
130 135 140 

Cys Leu Ala Val Cys Ala Phe He Val Leu Glu Asn Leu Ala Val Leu 
10 145 150 155 160 

Leu Val Leu Gly Arg His Pro Arg Phe His Ala Pro Met Phe Leu Leu 
165 170 175 

Leu Gly Ser Leu Thr Leu Ser Asp Leu Leu Ala Gly Ala Ala Tyr Ala 
180 185 190 

15 Ala Asn He Leu Leu Ser Gly Pro Leu Thr Leu Lys Leu Ser Pro Ala 

195 200 205 

Leu Trp Phe Ala Arg Glu Gly Gly Val Phe Val Ala Leu Thr Ala Ser 
210 215 220 

Val Leu Ser Leu Leu Ala lie Ala Leu Glu Arg Ser Leu Thr Met Ala 
20 22 5 230 235 240 

Arg Arg Gly Pro Ala Pro Val Ser Ser Arg Gly Arg Thr Leu Ala Met 
245 250 255 

Ala Ala Ala Ala Trp Gly Val Ser Leu Leu Leu Gly Leu Leu Pro Ala 
260 265 270 

25 Leu Gly Trp Asn Cys Leu Gly Arg Leu Asp Ala Cys Ser Thr Val Leu 

275 280 285 

Pro Leu Tyr Ala Lys Ala Tyr Val Leu Phe Cys Val Leu Ala Phe Val 
290 295 300 

Gly He Leu Ala Ala He Cys Ala Leu Tyr Ala Arg He Tyr Cys Gin 
30 305 310 315 320 

Val Arg Ala Asn Ala Arg Arg Leu Pro Ala Arg Pro Gly Thr Ala Gly 
325 330 335 

Thr Thr Ser Thr Arg Ala Arg Arg Lys Pro Arg Ser Leu Ala Leu Leu 
340 345 350 



35 



Arg Thr Leu Ser Val Val Leu Leu Ala Phe Val Ala Cys Trp Gly Pro 
355 360 365 



Leu Phe Leu Leu Leu Leu Leu Asp Val Ala Cys Pro Ala Arg Thr Cys 



WO 00/31 2 S8 

PCT/US99/23687 

- 39 - 

370 37E - 

S 1 b 3 8 0 



Pro Vai Leu Leu Gin Ala Asp 



38 f 



390 



Pro Phe Leu Gly Lou Ala Met Ala Asn 



395 



400 

Ser Leu Leu Asn Pro He Tl^ Tur t^,- t mi 

io lie He Tyr Thi Leu Thr Asn Arg Asp Leu Arg 

410 415 
A ^ ^ ^ LeU V ^ Gly Arg H 1S Ser Cys Gly Arg 



25 



4 30 



10 



Asp Pro Ser Gly Ser Gin Gin Ser Ala S— nh ,u, 1n . „ 

43s ° — --^ '--ci o^u >\j_a ser Gly 

440 445 
«y Arg Arg Cys Leu Pro Pro Gly Leu Asp Gly Ser P he Ser Gly 

455 460 

Ser Glu Arg ser ser Gln Arg Qiy ^ p T ^ s ^ q ^ 

4/b 480 
Thr Gly Ser Pro Gly Ala Pro Thr Ala Ala Arg Thr Leu Val Ser Glu 



485 4go 

"° 495 



Pro Ala Ala Asp 
500 

(32) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 
20 < A > LENGTH: 102 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

25 (XI) SEQUENCE DESCRIPTION: SEQ ID NO : 3 1 : 

ATGCAAGCCG TCGACAATCT CACCTCTGCG CCTGGGAACA CCAGTCTGTG CACCAGAGAC 60 

TACAAAATCA CCCAGGTCCT CTTCCCACTG CTCTACACTG TCCTGTTTTT TGTTGGACTT 120 

ATCACAAATG GOCTGGCGAT GAGGATTTTC TTTCAAATCC GGAGTAAATC AAACTTTATT 18 0 

ATTTTTCTTA AGAACACAGT CATTTCTGAT CTTCTCATGA TTCTGACTTT TCCATTCAAA 240 

0 ATTCTTAGTG ATGCCAAACT GGGAACAGGA CCACTGAGAA CTTTTGTGTG TCAAGTTACC 300 

TCCGTCATAT TTTATTTCAC AATGTATATC AGTATTTCAT TCCTGGGACT GATAACTATC 36 0 

GATCGCTACC AGAAGACCAC CAGGCCATTT AAAACATCCA ACCCCAAAAA TCTCTTGGGG 4 2 0 

GCTAAGATTC TCTCTGTTGT CATCTGGGCA TTCATGTTCT TACTCTCTTT GCCTAACATG 4 80 
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ATTCTGACCA AC AG G C AG C C GAGAGACAAG AATGTGAAGA AATGCTCTTT CCTTAAATCA 54 0 
GAGTTCGGTC TAGTCTGGCA TGAAATAGTA AATTACATCT GTCAAGTCAT TTTCTGGATT 60 0 
AATTTCTTAA TTGTTATTGT ATGTTATACA CTCATTACAA AAGAACTGTA CCGGTCATAC 66 0 
GTAAGAACGA GGGGTGTAGG TAAAGTCCCC AGGAAAAAGG TGAACGTCAA AGTTTTCATT 72 0 
5 ATCATTGCTG TATTCTTTAT TTGTTTTGTT CCTTTCCATT TTGCCCGAAT TCCTTACACC 78 0 
CTGAGCCAAA CCCGGGATGT CTTTGACTGC ACTGCTGAAA ATACTCTGTT CTATGTGAAA 84 0 
GAGAGCACTC TGTGGTTAAC TTC CTTAAAT GCATGCCTGG ATCCGTTCAT CTATTTTTTC 900 
CTTTGCAAGT CCTTCAGAAA TTC CTTGATA AGTATGCTGA AGTGCCCCAA TTCTG CAACA 96 0 
TCTCTGTCCC AGGACAATAG GAAAAAAGAA CAGGATGGTG GTGACCCAAA TG AAG AG ACT 102 0 
1 0 CCAATGTAA 

1029 

(33) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 342 amino acids 

(B) TYPE: amino acid 
15 (C) STRANDEDNESS : 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

Met Gin Ala Val Asp Asn Leu Thr Ser Ala Pro Gly Asn Thr Ser Leu 
20 1 5 10 15 

Cys Thr Arg Asp Tyr Lys lie Thr Gin Val Leu Phe Pro Leu Leu Tyr 
20 25 30 

Thr Val Leu Phe Phe Val Gly Leu He Thr Asn Gly Leu Ala Met Arg 
3 5 40 45 



25 



30 



He Phe Phe Gin He Arg Ser Lys Ser Asn Phe He He Phe Leu Lvs 

50 55 60 

Asn Thr Val He Ser Asp Leu Leu Met He Leu Thr Phe Pro Phe Lys 
65 70 75 80 

He Leu Ser Asp Ala Lys Leu Gly Thr Gly Pro Leu Arg Thr Phe Val 
85 90 95 

Cys Gin Val Thr Ser Val He Phe Tyr Phe Thr Met Tyr He Ser He 
100 105 no 

Ser Phe Leu Gly Leu He Thr He Asp Arg Tyr Gin Lys Thr Thr Arg 



10 
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115 120 125 

Pio Plie Lys Thr Ser Asn Pro Lys Acn Leu Leu Gly Ala Lys He Leu 
130 135 140 

Ser Val Val He Trp Ala Phe Met Phe Leu Leu Ser Leu Pro Asn Met: 
145 150 155 160 

He Leu Thr Asn Arg Gin Pro Arg Asp Lys Asn Val Lys Lys Cys Ser 
165 170 175 

Phe Leu Lys Ser Glu Phe Gly Leu Val Trp His Glu He Val Asn Tyr 
180 185 190 

He Cys Gin Val He Phe Trp He Asn Phe Leu He Val He Val Cys 
195 200 205 

Tyr Thr Leu He Thr Lys Glu Leu Tyr Arg Ser Tyr Val Arg Thr Arg 
210 215 220 

Gly Val Gly Lys Val Pro Arg Lys Lys Val Asn Val Lys Val Phe He 
225 230 235 240 

He He Ala Val Phe Phe He Cys Phe Val Pro Phe His Phe Ala Arg 
245 250 255 

lie Pro Tyr Thr Leu Ser Gin Thr Arg Asp Val Phe Asp Cys Thr Ala 
260 265 270 

Glu Asn Thr Leu Phe Tyr Val Lys Glu Ser Thr Leu Trp Leu Thr Ser 

280 285 

Leu Asn Ala Cys Leu Asp Pro Phe He Tyr Phe Phe Leu Cys Lys Ser 
290 295 300 

Phe Arg Asn Ser Leu He Ser Met Leu Lys Cys Pro Asn Ser Ala Thr 
25 305 310 315 32 0 

Ser Leu Ser Gin Asp Asn Arg Lys Lys Glu Gin Asp Gly Gly Asp Pro 

32 5 330 335 

Asn Glu Glu Thr Pro Met 
34 0 

30(34) INFORMATION FOR SEQ ID NO : 3 3 : 



20 



- 1 c . 



(D SEQUENCE CHARACTERISTICS 

(A) LENGTH: 1077 base pairs 

( B ) TYPE: nucleic acid 

(C) 5TRANDEDNESS : single 
<D) TOPOLOGY: linear 



(li) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
ATGTCGGTCT GCTACCGTCC CCCAGGGAAC GAGACACTGC TGAGCTGGAA GACTTCGCGG 6 0 
GCCACAGGCA CAGCCTTCCT GCTGCTGGCG GCGCTGCTGG GGCTGCCTGG CAACGGCTTC 12 0 
GTGGTGTGGA GCTTGGCGGG CTGGCGGCCT GCACGGGGGC GACCGCTGGC GGCCACGCTT 180 
5 GTGCTGCACC TGGCGCTGGC CGACGGCGCG GTGCTGCTGC TCACGCCGCT CTTTGTGGCC 24 0 
TTCCTGACCC GGCAGGCCTG GCCGCTGGGC CAGGCGGGCT GCAAGGCGGT GTACTACGTG 3 00 
TGCGCGCTCA GCATGTACGC CAGCGTGCTG CTCACCGGCC TGCTCAGCCT GCAGCGCTGC 36 0 
CTCGCAGTCA CCCGCCCCTT CCTGGCGCCT CGGCTGCGCA GCCCGGCCCT GGCCCGCCGC 42 0 
CTGCTGCTGG CGGTCTGGCT GGCCGCCCTG TTGCTCGCCG TCCCGGCCGC CGTCTACCGC 480 

1 0 CACCTGTGGA GGGACCGCGT ATGCCAGCTG TGCCACCCGT CGCCGGTCCA CGCCGCCGCC 54 0 
CACCTGAGCC TGGAGACTCT GACCGCTTTC GTGCTTCCTT TCGGGCTGAT GCTCGGCTGC 600 
TACAGCGTGA CGCTGGCACG GCTGCGGGGC GCCCGCTGGG GCTCCGGGCG GCACGGGGCG 66 0 
CGGGTGGGCC GGCTGGTGAG CGCCATCGTG CTTGCCTTCG GCTTGCTCTG GGCCCCCTAC 72 0 
CACGCAGTCA ACCTTCTGCA GGCGGTCGCA GCGCTGGCTC CACCGGAAGG GGCCTTGGCG 78 0 

15AAGCTGGGCG GAGCCGGCCA GGCGGCGCGA GCGGGAACTA CGGCCTTGGC CTTCTTCAGT 84 0 
TCTAGCGTCA ACCCGGTGCT CTACGTCTTC ACCGCTGGAG ATCTGCTGCC CCGGGCAGGT 90 0 
CCCCGTTTCC TCACGCGGCT CTTCGAAGGC TCTGGGGAGG CCCGAGGGGG CGGCCGCTCT 96 0 
AGGGAAGGGA CCATGGAGCT CCGAACTACC CCTCAGCTGA AAGTGGTGGG GCAGGGCCGC102 0 
GGCAATGGAG ACCCGGGGGG TGGGATGGAG AAGGACGGTC CGGAATGGGA CCTTTGA 1077 

20 (35) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 358 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

25 (D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 

Mec Ser Val Cys Tyr Arg Pro Pro Gly Asn Glu Thr Leu Leu Ser Trp 
1 5 10 15 

30 Lys Thr Ser Arg Ala Thr Gly Thr Ala Phe Leu Leu Leu Ala Ala Leu 
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30 



20 
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Leu (My Leu Pro Gly Asn Gly Pho Val Val Tip Ser Leu Ala Gl y Trp 
3 5 4 0 4 5 

Arg Pro Ala Arg Gly Arg Pro Leu Ala Ala Thr Leu Val Leu His Leu 
50 55 60 

Ala Leu Ala Asp Gly Ala Val Leu Leu Leu Thr Pro Leu Phe Val Ala 

65 70 75 80 

Phe Leu Thr Arg Gin Ala Trp Pro Leu Gly Gin Ala Gly Cys Lys Ala 
85 90 95 

Val Tyr Tyr Val Cys Ala Leu Ser Met Tyr Ala Ser Val Leu Leu Thr 
100 105 no 

Gly Leu Leu Ser Leu Gin Arg Cys Leu Ala Val Thr Arg Pro Phe Leu 
H5 120 125 

Ala Pro Arg Leu Arg Ser Pro Ala Leu Ala Arg Arg Leu Leu Leu Ala 
Ls 130 13 5 14 0 

Val Trp Leu Ala Ala Leu Leu Leu Ala Val Pro Ala Ala Val Tvr Arg 
145 150 155 ' 160 

His Leu Trp Arg Asp Arg Val Cys Gin Leu Cys His Pro Ser Pro Val 
165 170 175 

20 Hls Ala Ala Ala His Leu Ser Leu Glu Thr Leu Thr Ala Phe Val Leu 

180 185 190 

Pro Phe Gly Leu Met Leu Gly Cys Tyr Ser Val Thr Leu Ala Arg Leu 
195 200 205 

Arg Gly Ala Arg Trp Gly Ser Gly Arg His Gly Ala Arg Val Gly Arg 
25 210 215 220 

Leu Val Ser Ala He Val Leu Ala Phe Gly Leu Leu Trp Ala Pro Tyr 

225 230 235 240 

His Ala Val Asn Leu Leu Gin Ala Val Ala Ala Leu Ala Pro Pro Glu 
?-45 250 255 



Gly Ala Leu Ala Lys Leu Gly Gly Ala Gly Gin Ala Ala Arg Ala Gly 
260 265 270 

Thr Thr Ala Leu Ala Phe Phe Ser Ser Ser Val Asn Pro Val Leu Tvr 
-75 280 285 

Val Phe Thr Ala Gly Asp Leu Leu Pro Arg Ala Gly Pro Arg Phe Leu 

290 295 300 

Thr Arg Leu Phe Glu Gly Ser Gly Glu Ala Arg Gly Gly (My Arg Soi 
305 310 315 ^ J ' 3'>0 
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Arg Glu Gly Thr Met Glu Leu Arg Thr Thr Pro Gin Leu Lys Val Val 

325 330 335 

Gly Gin Gly Arg Gly Asn Gly Asp Pro Gly Gly Gly Met Glu Lys Asp 
340 345 350 

5 Gly Pro Glu Trp Asp Leu 

355 

(36) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1005 base pairs 
J 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

1 5 ATGCTGGGGA TCATGGCATG GAATGCAACT TGCAAAAACT GGCTGGCAGC AGAGGCTGCC 6 0 

CTGGAAAAGT ACTACCTTTC CATTTTTTAT GGGATTGAGT TCGTTGTGGG AGTCCTTGGA 12 0 

AAT AC CAT TG TTGTTTACGG CTACATCTTC TCTCTGAAGA ACTGGAACAG CAGTAATATT 18 0 

TATCTCTTTA ACCTCTCTGT CTCTGACTTA GCTTTTCTGT GCACCCTCCC CATGCTGATA 24 0 

AGGAGTTATG CCAATGGAAA CTGGATATAT GGAGACGTGC TCTGCATAAG CAACCGATAT 3 00 

20 GTGCTTCATG CCAACCTCTA TACCAGCATT CTCTTTCTCA CTTTTATCAG CAT AG AT C G A 360 

TACTTGATAA TTAAGTATCC TTTCCGAGAA CACCTTCTGC AAAAGAAAGA GTTTGCTATT 42 0 

TTAATCTCCT TGGCCATTTG GGTTTTAGTA ACCTTAGAGT TACTACCCAT ACTTCCCCTT 48 0 

ATAAATCCTG TTATAACTGA CAATGGCACC ACCTGTAATG ATTTTGCAAG TTCTGGAGAC 54 0 

CCCAACTACA ACCTCATTTA CAGCATGTGT CTAACACTGT TGGGGTTCCT TATTCCTCTT 6 00 

25 TTTGTGATGT GTTTCTTTTA TTACAAGATT GCTCTCTTCC TAAAGCAGAG GAATAGGCAG 66 0 

GTTGCTACTG CTCTGCCCCT TGAAAAGCCT CTCAACTTGG TCATCATGGC AGTGGTAATC 72 0 

TTCTCTGTGC TTTTTACACC CTATCACGTC ATGCGGAATG TGAGGATCGC TTCACGCCTG 78 0 

GGGAGTTGGA AGCAGTATCA GTGCACTCAG GTCGTCATCA ACTCCTTTTA CATTGTGACA 84 0 

CGGCCTTTGG CCTTTCTGAA CAGTGTCATC AACCCTGTCT TCTATTTTCT TTTGGGAGAT 90 0 

30 CACTTCAGGG ACATGCTGAT GAATCAACTG AGACACAACT TCAAATCCCT TACATCCTTT 96 0 

AGCAGATGGG CTCATGAACT CCTACTTTCA TTCAGAGAAA AGTGA i 0 05 
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(37) INFORMATION FOR SEQ ID NO : 3 6 : 

( .i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 334 amino acids 

<B) TYPE: amino acid 

(C) STRANDEDNESS : 

<D) TOPOLOGY: not relevant 

(n) MOLECULE TYPE: protein 



(xx ) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Met Leu Gly lie Met Ala Trp Asn Ala Thr Cys Lys Asn Trp Leu Ala 
1 5 io 15 

Ala Glu Ala Ala Leu Glu Lys Tyr Tyr Leu Ser He Pho Tyr Gly He 
20 25 30 

Glu Phe Val Val Gly Val Leu Gly Asn Thr He Val Val Tyr Gly Tyr 
35 40 45 

lie Phe Ser Leu Lys Asn Trp Asn Ser Ser Asn He Tyr Leu Phe Asn 
50 55 60 



Leu Ser Val Ser Asp Leu Ala Phe Leu Cys Thr Leu Pro Met Leu He 

65 70 75 so 

Arg Ser Tyr Ala Asn Gly Asn Trp He Tyr Gly Asp Val Leu Cys He 

20 85 90 95 

Ser Asn Arg Tyr Val Leu His Ala Asn Leu Tyr Thr Ser He Leu Phe 

100 105 no 

Leu Thr Phe He Ser He Asp Arg Tyr Leu He He Lys Tyr Pro Phe 

115 120 125 

25 Arg Glu His Leu Leu Gin Lys Lys Glu Phe Ala He Leu He Ser Leu 

130 135 140 

Ala He Trp Val Leu Val Thr Leu Glu Leu Leu Pro He Leu Pro Leu 

145 150 155 160 



30 



He Asn Pro Val He Thr Asp Asn Gly Thr Thr Cys Asn Asp Phe Ala 
165 170 175 

Ser Ser Gly Asp Pro Asn Tyr Asn Leu lie Tyr Ser Met Cys Leu Thr 
180 185 190 

Leu Leu Gly Phe Leu He Pro Leu Phe Val Met Cys Phe Phe Tyr Tyr 

195 200 20 5 

Lys He Ala Leu Phe Leu Lys Gin Arg Asn Arg Gin Val Ala Thr A.i a 
HO 215 22 0 
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Leu Pro Leu Glu Lys Pro Leu Asn Leu Val He Met Ala Val Val He 
225 230 235 240 

Phe Ser Val Leu Phe Thr Pro Tyr His Val Met Arg Asn Val Arg He 
245 250 255 

5 Ala Ser Arg Leu Gly Ser Trp Lys Gin Tyr Gin Cys Thr Gin Val Val 

260 265 270 

He Asn Ser Phe Tyr He Val Thr Arg Pro Leu Ala Phe Leu Asn Ser 
275 280 285 

Val He Asn Pro Val Phe Tyr Phe Leu Leu Gly Asp His Phe Arg Asp 
10 290 295 300 

Met Leu Met Asn Gin Leu Arg His Asn Phe Lys Ser Leu Thr Ser Phe 
305 310 315 320 

Ser Arg Trp Ala His Glu Leu Leu Leu Ser Phe Arg Glu Lys 
325 330 

15 (38) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 96 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

ATGCAGGCGC TTAACATTAC CCCGGAGCAG TTCTCTCGGC TGCTGCGGGA CCACAACCTG 6 0 

ACGCGGGAGC AGTTCATCGC TCTGTACCGG CTGCGACCGC TCGTCTACAC CCCAGAGCTG 12 0 

25 CCGGGACGCG CCAAGCTGGC CCTCGTGCTC ACCGGCGTGC TCATCTTCGC CCTGGCGCTC 18 0 

TTTGGCAATG CTCTGGTGTT CTACGTGGTG ACCCGCAGCA AGGCCATGCG CACCGTCACC 24 0 

AACATCTTTA TCTGCTCCTT GGCGCTCAGT GACCTGCTCA TCACCTTCTT CTGCATTCCC 3 00 

GTCACCATGC TCCAGAACAT TTCCGACAAC TGGCTGGGGG GTGCTTTCAT TTGCAAGATG 360 

GTGCCATTTG TCCAGTCTAC CGCTGTTGTG AC AG AAATG C TCACTATGAC CTGCATTGCT 42 0 

30 GTGGAAAGGC ACCAGGGACT TGTGCATCCT TTTAAAATGA AGTGGCAATA CACCAACCGA 4 80 

AGGGCTTTCA CAATGCTAGG TGTGGTCTGG C TGGTGG C AG TCATCGTAGG ATCAC CCATG 54 0 

TGGCACGTGC AACAACTTGA GATCAAATAT GACTTCCTAT ATGAAAAGGA ACACATCTGC 6 00 

TGCTTAGAAG AGTGGACCAG CCCTGTGCAC CAGAAGATCT ACACCACCTT CATCCTTGTC 660 
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ATCCTCTTC7C TCCTGCCTCT TATGGTGATG mmCTCT ACAGTAAAAT TO „ „ 0 
CTTTGGATAA AGAAAAGAGT TGGGGATGGT - maTSCTTC ^ TOGAJmftOAA 
ATGTCCAAAA TAGCCAGGAA GAAGAAACGA GC„A TGATGGTGAC AGTGGTGGGT .„ 
™ T G TGTGCTGGGC ACCATTCCAT GTTGTCCATA TGATGATTGA ATACAGTAAT M0 
—G G AATATGATGA TGTCACAATC AAGATGATTT TTGCTATCGT GGAA^ „ 0 
GGATTTTCCA ACTCGATCTG TAATCCCATT GTCTATGCAT TTATGAATGA AAACTTCAAA1 0^0 
AAAAATGTTT 'ravr-vc™™ -t-t—t^ 

' 1 ^ ^ ATA — ™ AAACCTTCTC 

AGGCATGCAA ATTCAGGAAT TACAATCATr rv^n, 

TACAATGATG CGGAAGAAAG CAAAGTTTTC CCTCAGAGAGJ 14 0 

AATCCAGTGG AGGAAACCAA „GCA TTCAGTGATG GCAACATTGA AGTCAAATTG1 2 0 0 

CAG " G — — TTGCTCTCTT 
CTGGCTGAGA ATTCTCCTTT AGACAGTGGG CATTAA 

(39) INFORMATION FOR SEQ ID NO: 38: 



1296 



15 



20 



(l) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 431 amino acids 
<B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY.- not relevant 

(ii) MOLECULE TYPE: protein 

(XI) SEQUENCE DESCRIPTION: SEQ ID NO:38 : 

Ma L - r - - - - - Leu Ar g 

Asp His Asn Leu Thr Arg Glu Gin Phe lie Ala r« t 

20 JIS Ala Leu Tyr Arg Leu Arg 

" J 30 

Pro Leu Va] Tyr Thr Pro m, i 

i*- 'io biu Leu Pro r,lu n>-„ n -i 

3 5 ° Giy Ar 9 Ala Lys Leu Ala Lou 

4b 

Val Leu Thr Gly Val L eu He P) 
Leu Val Ph 



50 ' Jhe Ala Leu A] a Leu Phe Gly Asn Ala 

6 0 



30 



■ T,r «., v.l Thr A,- g St „- Lyo „ a „ et flrg ^ Ti]i 

». «. «* sor L « u A,a - «p - - u „. :r 1 

»» <». -™ ™ „., r Leu G1 „ Aon llc scr asp aan " 
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100 105 



110 



10 



Gly Gly Ala Phe lie Cys Lys Met Val Pro Phe Val Gin Ser Thr Ala 
115 120 12 5 

Val Val Thr Glu Met Leu Thr Met Thr Cys He Ala Val Glu Arg His 
130 135 140 

Gin Gly Leu Val His Pro Phe Lys Met Lys Trp Gin Tyr Thr Asn Arg 
145 150 155 160 

Arg Ala Phe Thr Met Leu Gly Val Val Trp Leu Val Ala Val He Val 
165 170 175 

Gly Ser Pro Met Trp His Val Gin Gin Leu Glu He Lys Tyr Asp Phe 
180 185 iso 

Leu Tyr Glu Lys Glu His He Cys Cys Leu Glu Glu Trp Thr Ser Pro 
195 200 205 

Val His Gin Lys He Tyr Thr Thr Phe He Leu Val He Leu Phe Leu 
15 210 215 220 

Leu Pro Leu Met Val Met Leu He Leu Tyr Ser Lys He Gly Tyr Glu 
225 230 235 24Q 

Leu Trp He Lys Lys Arg Val Gly Asp Gly Ser Val Leu Arg Thr He 
245 250 255 

20 His Gly Lys Glu Met Ser Lys He Ala Arg Lys Lys Lys Arg Ala Val 

260 265 270 

He Met Met Val Thr Val Val Ala Leu Phe Ala Val Cys Trp Ala Pro 
275 280 285 

Phe His Val Val His Met Met He Glu Tyr Ser Asn Phe Glu Lys Glu 
25 290 295 300 

Tyr Asp Asp Val Thr He Lys Met He Phe Ala He Val Gin He He 
305 310 315 320 

Gly Phe Ser Asn Ser He Cys Asn Pro He Val Tyr Ala Phe Met Asn 
325 330 335 
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Glu Asn Phe Lys Lys Asn Val Leu Ser Ala Val Cys Tyr Cys He Val 
340 345 350 

Asn Lys Thr Phe Ser Pro Ala Gin Arg His Gly Asn Ser Gly He Thr 
355 360 365 

Met Met Arg Lys Lys Ala Lys Phe Ser Leu Arg Glu Asn Pro Val Glu 
370 375 380 

Glu Thr Lys Gly Glu Ala Phe Ser Asp Gly Asn He Glu Val Lys Leu 
385 3 ^0 395 400 
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(.i) MOLECULE TYPE: DNA (genomic) 

(Xl) SEQUENCE DESCRIPTION: SEQ ID NO : 3 9 : 
CTGTGTACAG CAGTTCGCAG AGTG 
(41) INFORMATION FOR SEQ ID NO : 4 0 : 

15 (1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 0 : 
GAGTGCCAGG CAGAGCAGGT AGAC 
(42) INFORMATION FOR SEQ ID NO : 4 1 : 

(i) SEQUENCE CHARACTERISTICS - 

(A) LENGTH: 31 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iv) ANT I- SENSE: NO 



25 



30 



(X^ SEQUENCE DESCRIPTION: SEQ ID NO : 4 1 
CCCGAATTCC TGCTTGCTCC CAGCTTGGCC C 
(4 3) INFORMATION FOR SEQ ID NO : 4 2 : 



430 



Cys Glu Gin Thr Glu Glu Lv- T v- i ^„ t 

4 0S " ' " ° U Al " 9 Hic ^ Ala Leu 

410 415 

Sei " G 4 l U Q L ^ Al * ^ ^ ^ *ro Leu Asp Sei Gl y His 
u 425 

5 (4 0) INFORMATION FOR SEQ ID NO : 3 9 : 

(i) SEQUENCE CHARACTERISTICS - 

(A) LENGTH: 24 base pain 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
-w^uxjOoi : linear 



24 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iv) ANTI- SENSE: YES 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 
TGTGGATCCT GCTGTCAAAG GTCCCATTCC GG 
10 (44) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 

15 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iv) ANTI -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 3 : 
TCACAATGCT AGGTGTGGTC 
20 (45) INFORMATION FOR SEQ ID NO:44: 

SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iv) ANTI -SENSE: YES 



(i) 



25 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 
TG CAT AG AC A ATGGGATTAC AG 
30 (4 6) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 511 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 

genomic ) 



(il) MOLECULE TYPE: DNA 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45 : 

AGGTGTGGTC TGGCTGGTGG CAGTCATCGT AGGATCACCC ATGTGGCACG &0 
5 TGCAACAACT TGAGATCAAA TATGACTTCC TATATGAAAA GGAACACATC TGCTGCTTAG 1, 0 
AAGAG TGGAC CAGCCCTGTG CACCAGAAGA TCTACACCAC CTTCATCCTT 

TCCTCCTGCC TCTTATGGTG ATGCTTATTC TGTACGTAAA ATTGGTTATG AACTTTGGAT 240 
AAAGAAAAGA CTTGGGGATG GTTCAGTGCT TCGAACTATT CATGGAAAAG AAATGTCCAA 300 
AATAGCCAGG AAGAAGAAAC GAG CTGTCAT TATGATGGTG ACAGTGGTGG CTCTCTTTGC 3 60 
TGTGTGCTGG GCACCATTCC ATGTTGTCCA TATGATGATT GAATACAGTA ATTTTGAAAA 420 
GGAATATGAT GATGTCACAA TCAAGATGAT TTTTGCTATC GTGCAAATTA TTGGATTTTC 480 
CAACTCCATC TGTAATCCCA TTGTCTATGC A 

511 

(4 7) INFORMATION FOR SEQ ID NO -46- 



15 



20 



(l) SEQUENCE CHARACTERISTICS ■ 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iv) ANT1 - SENSE : NO 



(XI) SEQUENCE DESCRIPTION : SEQ ID NO: 46 
CTG CTTAGAA GAGTGGACCA G 
(48) INFORMATION FOR SEQ ID NO : 4 7 : 

(l! SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 
(iv) ANT I - SENSE : NO 



21 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 7 : 

CTGTGCACCA GAAGATCTAC AC 22 

(4 9) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
10 (iv) ANTI- SENSE: YES 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: 

CAAGGATGAA GGTGGTGTAG A 2l 

(50) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 
!5 (A) LENGTH: 2 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
20 (iv) ANTI - SENSE : YES 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 9 : 
GTGTAGATCT TCTGGTGCAC AGG 
(51) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
GCAATGCAGG TCATAGTGAG C 



(52) INFORMATION FOR SEQ ID NO: 51: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
5 CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ill) HYPOTHETICAL: YES 

(iv) ANT I - SENSE : YES 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 51: 
10 TGGAGCATGG TGACGGGAAT GCAGAAG 21 

(53) INFORMATION FOR SEQ ID NO: 52: 

Cl) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

15 (C) STRANDEDNESS: single 

{ D ) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iv) ANTI -SENSE: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
20 GTGATGAGCA GGTCACTGAG CGCCAAG 2 7 

(54) INFORMATION FOR SEQ ID NO: 53: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 3 : 
30 GCAATGCAGG CGCTTAACAT TAC 2 3 

(55) INFORMATION FOR SEQ ID NO: 54: 

(■) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 2 2 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
> (iv) ANT I- SENSE: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
TTGGGTTACA ATCTGAAGGG CA 22 

(56) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
15 (iv) ANT I- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
ACTCCGTGTC CAGCAGGACT CTG 2 3 

(57) INFORMATION FOR SEQ ID NO : 56 : 

(i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
25 (iv) ANT I- SENSE: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
TGCGTGTTCC TGGACCCTCA CGTG 24 

(58) INFORMATION FOR SEQ ID NO: 57: 



30 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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Ui] MOLECULE TYPE : DNA (genomic) 
(iv) ANTI-SENSE: NO 

<XD SEQUENCE DESCRIPTION : SEQ ID NQ . 
CAGGCCTTCG ATTTTAATCT CAGGGATGG 
5 (59) INFORMATION FOR SEQ ID NO : 56 : 

(l) SEQUENCE CHARACTERISTICS- 
(A! LENGTH ; 27 base pairs 
TYp E: nucleic acid 
ln <C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iv) ANTI -SENSE: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 
GGAGAGTCAG CTCTGAAAGA ATTCAGG 
>5 (60) INFORMATION FOR SEQ ID NO : 5 9 : 

(i) SEQUENCE CHARACTERISTICS • 
(A) LENGTH: 2 7 base pairs 
<B) TYPE: nucleic acid 
?(i (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

Ul) MOLECULE TYPE : DNA (genomic) 
(iv) ANTI - SENSE : NO 

(XX) SEQUENCE DESCRIPTION : SEQ ID NO: 59 
TGATGTGATG CCAGATACTA ATAGCAC 
25 (61) INFORMATION FOR SEQ ID NO: 60: 

<i> SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 2 7 base pairs 

(B) TYPE: nucleic acid 

0 (C) STRANDEDNESS.- single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE- nwn i 

•int. una (genomic) 

(iv) ANTI -SENSE: YES 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6 0 

C CTGATTC AT TTAGGTGAGA TTGAGAC 

(62) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 (iv) ANTI -SENSE: NO 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

GACAGGTACC TTGCCATCAA G 

(63) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 
'5 (A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
20 (iv) ANTI -SENSE: YES 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 

CTGCACAATG CCAGTGATAA GG 

(64) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 2 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 
30 (iv) ANTI - SENSE : NO 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63 
CTGACTTCTT GTTCCTGGCA GCAGCGG 
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(6 5) INFORMATION FOR SEQ ID NO: 64: 

<i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 2 7 base pairs 

(B) TYPE: nuclexc acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

M0LE ™LE TYPE : DNA (genomic) 
(iv) ANT I- SENSE: YES 

(XX) SEQUENCE DESCRIPTION: SEQ ID NO: 64 
I0AGACCAGCCA GGGCACGCTG AAGAGTG 

(66) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE : DNA (genomxc) 
(iv) ANTI- SENSE: NO 

<Xl) SEQUENCE DESCRIPTION : SEQ ID NO: 65: 
20 GATCAAGCTT CCATCCTACT GAAACCATGG TC 
(67) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS • 
(A) LENGTH: 3 5 base pairs 
, 5 (B) Ty PE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic, 
(iv) ANTI -SENSE: YES 

<*i> SEQUENCE DESCRIPTION: SEQ ID NO:66: 
30 GATCAGATCT CAGTTCCAAT ATTCACACCA CCGTC 
(68) INFORMATION FOR SEQ ID NO : 6 7 : 
SEQUENCE CHARACTERISTICS: 
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( A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

5 (ii) MOLECULE TYPE: DNA (genomic) 

(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 7 : 
CTGGTGTGCT CCATGGCATC CC 22 
(6 9) INFORMATION FOR SEQ ID NO: 68: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

'5 (ii) MOLECULE TYPE: DNA (genomic) 

(iv) ANTI -SENSE: YES 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 
GTAAGCCTCC CAGAACGAGA GG 22 
(70) INFORMATION FOR SEQ ID NO: 69: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

25 (ii) MOLECULE TYPE: DNA (genomic) 

(iv) ANTI -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 
CAGCGCAGGG TGAAGCCTGA GAGC 
(71) INFORMATION FOR SEQ ID NO: 70: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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Hi) MOLECULE TYPE: DNA (genomic) 
(iv) ANTI- SENSE: YES 

<xi> SEQUENCE DESCRIPTION: SEQ ID NO: 70 
GGCACCTGCT GTGACCTGTG CAGG 
5 (72) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH • k= 

. j-j-ivuim. ^ base pairs 

<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genome) 
(iv) ANTI -SENSE: NO 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 
GTCCTGCCAC TTCGAGACAT GG 
15 (73) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 2 3 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 
(iv) ANTI-SENSE: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 
GAAACTTCTC TGCCCTTACC GTC 
25 (74) INFORMATION FOR SEQ ID NO: 7 3: 

(i) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 
M) (C) STRANDEDNESS: single 

< D > TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(IV) ANTI-SENSE: NO 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 
CCAACACCAG CATCCATGGC ATCAAG 

2 6 

(75) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 2 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
10 (iv) ANTI-SENSE: YES 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 
GGAGAGTCAG CTCTGAAAGA ATTCAGG 
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This International Search Report has not been established in respect of certain claims under Artide 17(2)(a) for the following reasons: 
Claims Nos.: 

because they relate to subject matter not required to be searched by this Authority, namely: 



!. I | Claims Nos.: 

because they relate to parts of the International Application that do not comply with the prescribed requirements to such 
an extent that no meaningful International- Search can be carried out, specifically: 



3. | I Claims Nos.: 

because they are dependent claims and are not drafted in accordance with the second and third sentences of Rule 6.4(a). 



Box II Observations where unity of invention is lacking (Continuation of item 2 of first sheet) 



This International Searching Authority found multiple inventions in this international application, as follows: 

see additional sheet 



1 . I w I As all required additional search fees were timely paid by the applicant, this International Search Report covers all 
l"*-* searchable claims. 

2. | | As all searchable claims could be searched without effort justifying an additional fee, this Authority did not invite payment 

of any additional fee. 



3. I I As only some of the required additional search fees were timely paid by the applicant, this International Search Report 
' ' covers only those claims for which fees were paid, specifically claims Nos.: 



4. | | No required additional search fees were timely paid by the applicant. Consequently, this International Search Report is 
restricted to the invention first mentioned in the claims; it is covered by claims Nos.: 



Remark on Protest ] The additional search fees were accompanied by the applicant's protest. 
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This International Searching Authority found multiple (groups of) 
inventions in this international application, as follows: 

1. Claims: 1-4 

Human G protein-coupled receptor as characterized by 
SEQ. ID. 2, a cDNA encoding said receptor as characterized by 
SEQ. ID. 1, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 



2. Claims: 5-8 

Human G protein-coupled receptor as characterized by 
SEQ. ID. 4, a cDNA encoding said receptor as characterized by 
SEQ. ID. 3, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 



3. Claims: 9-12 

Human G protein-coupled receptor as characterized by 
SEQ. ID. 6, a cDNA encoding said receptor as characterized by 
SEQ.ID.5, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 



4. Claims: 13-16 

Human G protein-coupled receptor as characterized by 
SEQ. ID. 8, a cDNA encoding said receptor as characterized by 
SEQ. ID. 7, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 



5. Claims: 17-20 

Human G protein-coupled receptor as characterized by 
SEQ.ID.10, a cDNA encoding said receptor as characterized by 
SEQ. ID. 9, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 



6. Claims: 21-24 

Human G protein-coupled receptor as characterized by 
SEQ. ID. 12 , a cDNA encoding said receptor as characterized by 
SEQ. ID. 11, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 



7. Claims: 25-28 

Human G protein-coupled receptor as characterized by 

SEQ. ID. 14, a cDNA encoding said receptor as characterized by 
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SEQ.ID.13, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 



8. Claims: 29-32 

Human G protein-coupled receptor as characterized by 
SEQ.ID.16, a cDNA encoding said receptor as characterized by 
SEQ.ID.15, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 



9. Claims: 33-36 

Human G protein-coupled receptor as characterized by 
SEQ.ID.18, a cDNA encoding said receptor as characterized by 
SEQ.ID.17, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 



10. Claims: 37-40 

Human G protein-coupled receptor as characterized by 
SEQ.ID.20, a cDNA encoding said receptor as characterized by 
SEQ.ID.19, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 



11. Claims: 41-44 

Human G protein-coupled receptor as characterized by 
SEQ.ID.22, a cDNA encoding said receptor as characterized by 
SEQ.ID.21, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 



12. Claims: 45-48 

Human G protein-coupled receptor as characterized by 
SEQ.ID.24, a cDNA encoding said receptor as characterized by 
SEQ.ID.23, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 



13. Claims: 49-52 

Human G protein-coupled receptor as characterized by 
SEQ.ID.26, a cDNA encoding said receptor as characterized by 
SEQ.ID.25, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 



14. Claims: 53-56 

Human G protein-coupled receptor as characterized by 
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SEQ.ID.28, a cDNA encoding said receptor as characterized by 
SEQ.ID.27, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 



15. Claims: 57-60 



Human G protein-coupled receptor as characterized by 
SEG.ID.30, a cDNA encoding said receptor as characterized by 
SEG.ID.29, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 



16. Claims: 61-64 



Human G protein-coupled receptor as characterized by 
SEQ.ID.32, a cDNA encoding said receptor as characterized by 
SEG.ID.31, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 



17. Claims: 65-68 



Human G protein-coupled receptor as characterized by 
SEQ.ID.34, a cDNA encoding said receptor as characterized by 
SEG.ID.33, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 



18. Claims: 69-72 



Human G protein-coupled receptor as characterized by 
SEQ.ID.36, a cDNA encoding said receptor as characterized by 
SEQ.ID.35, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 



19. Claims: 73-76 



Human G protein-coupled receptor as characterized by 
SEQ.ID.38, a cDNA encoding said receptor as characterized by 
SEQ.ID.37, a plasmid comprising said cDNA, and a host cell 
comprising said plasmid. 
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